What You See Is What You Get Element

DDL Optimization in Oracle Database 12c

By Mohamed Houri


Introduction

Starting from Oracle 11g, a new powerful optimization technique improving the performance of Data Definition Language operations has been implemented. When you add a not null column to an existing table and in the same time you would like to attach a constant default value to this newly added column then a new DDL optimization kicks in making the DDL operation instantaneously accomplished. How could this be possible when the altered table has millions of records which should have their newly added column updated with the constant default value? And does this new optimization come without side effects that we should be aware of before using it? This is what I am going to tell you about in this article.

The Concept

Consider the following table with 3 million worth of rows:

SQL> create table t1
    as select
     rownum n1
     , trunc ((rownum-1)/3) n2
     , trunc(dbms_random.value(rownum, rownum*10)) n3
     , dbms_random.string('U', 10) c1
  from dual
  connect by level <= 3e6;

SQL> desc t1

          Name              Null?   Type
          -----------------------   ---------
   1      N1                        NUMBER
   2      N2                        NUMBER
   3      N3                        NUMBER
   4      C1                        VARCHAR2(4000 CHAR)


Table to which I am going to add an extra not null column having a default value.

SQL> alter table t1 add C_DDL number default 42 not null;


Where I have bolded the two crucial words default and not null because they represent the keys that drive this new feature.

In order to appreciate the difference in the execution time of the above alter table command, I am going to execute it into two different Oracle database releases, 10.2.0.4.0 and 11.2.0.3.0:

10.2.0.4.0 > alter table t1 add C_DDL number default 42 not null;

Table altered.
Elapsed: 00:00:48.53

11.2.0.3.0> alter table t1 add C_DDL number default 42 not null;

Table altered.
Elapsed: 00:00:00.04


Notice the difference in the execution times. The C_DDL column has been added instantaneously in 11gR2 database while it took almost 49 seconds in 10gR2. What is this new mechanism that allows such an extremely rapid execution time when adding a not null column with default value to an existing table?

How could 3 million of rows be updated in 4 milliseconds?

Let’s verify visually if the update has been really done (from now and on when the Oracle version is not specified it will then refer to 11.0.2.3)

SQL> select count(1) from t1;

   COUNT(1)
  ----------
   3000000

SQL> select count(1) from t1 where c_ddl = 42;

   COUNT(1)
  ----------
   3000000 


Although Oracle has altered the t1 table instantaneously, the query is showing that the whole bunch of C_DDL column has been updated with their default value set to 42. How could this be possible? Will the execution plan be of any help here?

SQL> select * from table(dbms_xplan.display_cursor);

---------------------------------------------------------------------------
| Id | Operation        | Name | Rows | Bytes | Cost (%CPU)| Time          |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT  |      |      |       | 3016 (100) |               |
| 1 | SORT AGGREGATE    |      |    1 |     3 |            |               |
|*2 | TABLE ACCESS FULL | T1   | 2999K|  8788K| 3016   (5) | 00:00:10      |
---------------------------------------------------------------------------

 Predicate Information (identified by operation id):
 ---------------------------------------------------
     2 - filter(NVL("C_DDL",42)=42) 


Notice here again how the predicate part of the above execution plan can reveal vital information when trying to understand what is happening behind the scene. Despite I haven’t used the NVL function in my query, this one appears in the predicate part indicating that, internally, Oracle is still considering the C_DDL column as to be potentially able to contain null values (which means it has not been updated) and, as such, Oracle is replacing it with its default value 42.

We have the earlier release as a baseline to compare and locate the difference:

10.2.0.4.0 > select count(1) from t1 where c_ddl = 42;

  COUNT(1)
 ----------
  3000000

10.2.0.4.0> select * from table(dbms_xplan.display_cursor);

---------------------------------------------------------------------------
| Id | Operation        | Name | Rows | Bytes | Cost (%CPU)| Time          |
---------------------------------------------------------------------------
|  0 | SELECT STATEMENT |      |      |       | 4001 (100) |               |
|  1 | SORT AGGREGATE   |      |     1|    3  |            |               |
|* 2 | TABLE ACCESS FULL| T1   | 3000K| 8789K | 4001   (8) | 00:00:09      |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("C_DDL"=42) 


The absence of the NVL function in the predicate part together with the time it took to add the column in 10gR2 (00:00:48.53) explain the working concept introduced in 11gR1 to optimize adding not null defaulted column to an existing table.

Simply stated, ever since Oracle 11gR1 when you add a not null column with a default value, Oracle will not update all existing rows with this default value. Instead Oracle will store a metadata for this new column (not null constraint and default value 42) and allow the column to be added almost instantaneously whatever the size of the altered table is. Of course this is possible at the cost of adding a NVL function when retrieving the added column from a table block.

Having explained this wonderful concept of DDL optimization, I am going, in the next section, to investigate a little bit on how this feature is managed by Oracle to ensure rapidity of DDL and guarantee of correct and performant results during data retrieving. We will see particularly the difference that exists between retrieving the added column from a table block in contrast to when the same column is visited via an index leaf block.

The Working Mechanism on the Altered Table

We saw above that we gain performance when adding a not null column with default value. But we saw also that this has been made possible because Oracle introduces a NVL function which is applied to the added column so that it mimics null C_DDL to be equal to their default value thanks to the corresponding metadata stored into the data dictionary. Does this implicit use of NVL function introduce a side effect?

Well, first we saw above that this has no influence on the estimations done by the CBO as this one is accurately estimating the number of rows to be generated as shown below:

SQL> select /*+ gather_plan_statistics */ count(1) from t1 where C_DDL = 42;

  COUNT(1)
 ----------
  3000000

SQL> select * from table(dbms_xplan.display_cursor(null,null,’ALLSTATS LAST’));

---------------------------------------------------------------------------
| Id | Operation        | Name | Starts | E-Rows | A-Rows | A-Time         |
---------------------------------------------------------------------------
| 0  | SELECT STATEMENT |      |      1 |        |       1| 00:00:00.37    |
| 1  | SORT AGGREGATE   |      |      1 |       1|       1| 00:00:00.37    |
|* 2 | TABLE ACCESS FULL| T1   |      1 |   2999K|   3000K| 00:00:00.44    |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(NVL("C_DDL",42)=42) 


But there is an extra time (44ms) when full scanning the table blocks, probably due to the new filter which uses the NVL function when compared to the execution time of the same operation in earlier release (5ms):

10.2.0.4.0> select /*+ gather_plan_statistics */ count(1) from t1 where C_DDL = 42

  COUNT(1)
 ----------
  3000000

10.2.0.4.0> select * from table(dbms_xplan.display_cursor(null,null,’ALLSTATS LAST’));

---------------------------------------------------------------------------
| Id | Operation        | Name | Starts | E-Rows | A-Rows | A-Time        |
---------------------------------------------------------------------------
| 1  | SORT AGGREGATE   |      |      1 |      1 |      1 | 00:00:01.06   |
|* 2 | TABLE ACCESS FULL| T1   |      1 |   3000K|   3000K| 00:00:00.05   |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
    2 - filter("C_DDL"=42) 


Working Mechanism on an Indexed C_DDL Column

When a function is applied to a column that figures into the predicate part, it will preclude the use of any index that might exist on this column. In this particular case will the NVL function which is applied to the C_DDL column pre-empt an index to be used by the CBO if this column is indexed? That’s what we are going to see herein after

Consider the following index:
 

SQL> create index i1_c_ddl on t1(c_ddl);

Index created.

Elapsed: 00:00:02.14


And Re-query again:
 

 SQL> select /*+ gather_plan_statistics */ count(1) from t1 where C_DDL = 42;

  COUNT(1)
 ----------
  3000000


SQL> select * from table(dbms_xplan.display_cursor(null,null,’ALLSTATS LAST’));

---------------------------------------------------------------------------------
| Id | Operation           | Name     | Starts | E-Rows | A-Rows | A-Time        |
----------------------------------------------------------------------------------
| 0  | SELECT STATEMENT    |          |      1 |        |      1 | 00:00:00.47   |
| 1  | SORT AGGREGATE      |          |      1 |      1 |      1 | 00:00:00.47   |
|* 2 | INDEX FAST FULL SCAN| I1_C_DDL |      1 |   2999K|   3000K| 00:00:00.75   |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("C_DDL"=42) 


There is good news to emphasize here: the hidden NVL function is not applied to the C_DDL column when retrieving its value from the index leaf block which explains why the index has been used by the CBO.

But you could argue and say this is a normal behavior: an index cannot contain null column values. So let’s create a composite multi-column index with a not null column to protect not null values of the C_DDL column. Something resembling to this:

SQL> drop index i1_c_ddl;

Index dropped.

SQL> alter table t1 modify n1 not null;

Table altered.

SQL> create index i2_n1_c_ddl on t1(n1,c_ddl);

Index created.


SQL> select /*+ gather_plan_statistics */ count(1) from t1 where n1= 100 and C_DDL = 42;

  COUNT(1)
 ----------
    1
---------------------------------------------------------------------------------
| Id | Operation        | Name        | Starts | E-Rows | A-Rows | A-Time       |
---------------------------------------------------------------------------------
| 0  | SELECT STATEMENT |             |      1 |        |      1 | 00:00:00.01  |
| 1  | SORT AGGREGATE   |             |      1 |      1 |      1 | 00:00:00.01  |
|* 2 | INDEX RANGE SCAN | I2_N1_C_DDL |      1 |      1 |      1 | 00:00:00.01  |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"=100 AND "C_DDL"=42) 


Even when the added C_DDL column is protected against null values by its presence in a composite index, there is no trace of the hidden NVL function applied to the C_DDL column. This is clearly demonstrating that, in contrast to the table blocks where there is no update of C_DDL column, an index that is created on the same column will see its leaf blocks immediately being populated by the default value of the C_DDL column.

Before finishing this section let me show you one more interesting issue. We have seen so far that, each time the CBO has decided to visit a table block, it applied the NVL function to the C_DDL column in order to ensure retrieving not null C_DDL value (as far as this one has not been updated). But we have seen that this filter is always applied when the table is fully scanned (TABLE ACCESS FULL). Will the CBO apply this NVL function when the t1 table is accessed via index (TABLE ACCESS BY INDEX ROWID)? Let’s engineer a simple case and observe the CBO reaction in this particular situation:

SQL> drop index i2_n1_c_ddl;

SQL> create index i2_n1_c_ddl on t1(n1);

SQL> select /*+ gather_plan_statistics */ count(1) from t1 where n1= 100 and C_DDL = 42;

-------------------------------------------------------------------------------
| Id | Operation                    | Name        | Starts | E-Rows | A-Rows  |
-------------------------------------------------------------------------------
| 0  | SELECT STATEMENT             |             |      1 |        |      1  |
| 1  | SORT AGGREGATE               |             |      1 |      1 |      1  |
|* 2 | TABLE ACCESS BY INDEX ROWID  |          T1 |      1 |      1 |      1  |
|* 3 | INDEX RANGE SCAN             | I2_N1_C_DDL |      1 |      1 |      1  |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(NVL("C_DDL",42)=42)
   3 - access("N1"=100) 


Spot how the NVL function is also applied on the C_DDL column even when the table t1 is visited via index rowid.

We are now confident to say that each time the CBO visits a table block, being it via a single block or a multi-block read, it will apply the NVL function to any “DDL optimized” column it has to filter from those visited table blocks. However, the CBO will not apply the NVL function to the “DDL optimized” column, if this one is acquired from an index leaf block.

Oracle 12c and Data Optimization for NULL Columns

With the arrival of the 12c Oracle release it is legitimate to ask whether the DDL optimization is still available in Oracle 12c or not. A picture being worth a thousand words, let’s repeat the same experiment in this release too:

12c > alter table t1 add C_DDL number default 42 not null;

Elapsed: 00:00:00.02 


Almost instantaneously. DDL optimization kicks in here too as shown and proofed again via the use of the NVL function in the predicate part of the following query:

12c> select count(1) from t1 where c_ddl=42;

  COUNT(1)
 ----------
  3000000

12c> select * from table(dbms_xplan.display_cursor);

---------------------------------------------------------------------------
| Id | Operation        | Name  | Rows | Bytes | Cost (%CPU)| Time         |
---------------------------------------------------------------------------
| 0  | SELECT STATEMENT |       |      |       |  3802 (100)|              |
| 1  | SORT AGGREGATE   |       |    1 |    13 |            |              |
|* 2 | TABLE ACCESS FULL|    T1 | 3538K|    43M|    3802 (1)|  00:00:01    |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(NVL("C_DDL",42)=42)

Note
-----
- dynamic statistics used: dynamic sampling (level=2)


But there is, albeit, a little addition to the DDL optimization in 12c release when compared to the 11gR1 version. In 12c database, the DDL optimization has been extended to include null columns having default value. Consider the following alter table done in 11gR2 and 12c respectively in order to clearly appreciate the difference.

11.2.0.3.0> alter table t1 add C_DDL_2 number default 84;

Table altered.

Elapsed: 00:00:58.25

12c> alter table t1 add C_DDL_2 number default 84;

Elapsed: 00:00:00.02


While adding the nullable C_DDL_2 column took 58 seconds to be honored in 11gR2 it has been instantaneously done in 12c.

This is a clear demonstration that in Oracle Database 12c, DDL optimization has been extended to include null columns having default values. Indeed, when you query t1 table to get the distinct values of the newly added column (C_DDL_2) you will realize that the entire table rows have seen their metadata (default value 84) updated as shown via the following query:

12c> select c_ddl_2, count(1) from t1 group by c_ddl_2;

   C_DDL_2    COUNT(1)
   -------   ----------
    84        3000000


SQL> select count(1) from t1 where c_ddl_2=84;

  COUNT(1)
 ----------
  3000000

SQL> select * from table(dbms_xplan.display_cursor);

---------------------------------------------------------------------------
| Id | Operation        | Name | Rows | Bytes | Cost (%CPU)| Time         |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT  |      |      |       | 3803 (100) |              |
| 1 | SORT AGGREGATE    |      |    1 |    13 |            |              |
|* 2| TABLE ACCESS FULL | T1   | 3538K|    43M|   3803 (1) |  00:00:01    |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
    2 - filter(DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00006$",0)),NULL,NVL("
               C_DDL_2",84),'0',NVL("C_DDL_2",84),'1',"C_DDL_2")=84)

Note
-----
- dynamic statistics used: dynamic sampling (level=2) 


However, in order to ensure DDL optimization for null columns with default value, things became more complex than it used to be for not null columns in the preceding release. We went from a simple implicit use of the NVL function to a complex and exotic predicate part involving SYS_OP_VECBIT Oracle non documented function and a new internal column SYS_NC00006$ in order to honor the default value since this one has not been physically updated.

In contrast to what you might immediately think of, the SYS_NC00006$ column is not a virtual column. It represents a hidden system generated column as shown below:

  12c> SELECT
             column_name
            ,virtual_column
            ,hidden_column
            ,user_generated
       FROM
             user_tab_cols
       WHERE table_name = 'T1'
       AND   column_name = ‘SYS_NC00006$’;

COLUMN_NAME          VIR HID USE
-------------------- --- --- ---
SYS_NC00006$          NO YES NO 

Even though that this column is hidden it doesn’t pre-empt us from selecting it.

  12c> select
          a.c_ddl_2
         ,a.SYS_NC00006$
       from t1 a
       where c_ddl_2 =84
       and rownum <=5;

C_DDL_2  SYS_NC00006$
-------  ------------
     84
     84
     84
     84
     84 


The SYS_NC00006$ column will remain null until the C_DDL_2 column will be given a value that is not equal to the default value 84. Consider the following inserts:

12c> insert into t1 values (0,0,0,'xxxxx',110,130);

1 row created.

12c> insert into t1 values (1,1,1,'xxxxx',140,150);

1 row created.

   12c> insert into t1 values (1,1,1,'xxxxx',200,null);

   12c> select
           a.c_ddl_2
          ,a.SYS_NC00006$
        from t1 a
        where a.c_ddl_2 in (130,150);

  C_DDL_2  SYS_NC00006$
  -------  ------------
      130  01
      150  01

  SQL> select
          a.c_ddl_2
         ,a.SYS_NC00006$
       from t1 a
       where a.c_ddl_2 is null;

  C_DDL_2   SYS_NC00006$
  -------   ------------
            01 


Notice how the SYS_NC00006$ hidden column value is no longer NULL when we insert a non-default value into the C_DDL_2 column (including the explicit NULL value).

Putting together the different pieces of the puzzle, we can easily understand what that exotic, but in fine simple, predicate part reproduced here below is exactly doing this:

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00006$",0)),NULL,NVL("
              C_DDL_2",84),'0',NVL("C_DDL_2",84),'1',"C_DDL_2")=84) 


Oracle is simply checking through its system generated column and via the SYS_OP_VECBIT function whether to consider the default value of the C_DDL_2 column or the real value introduced by an end user or via an explicit insert statement. Let’s mimic what Oracle is doing with our above SYS_NC00006$ column values i.e. ‘01’ and NULL.

    12c> SELECT
            a.c_ddl_2
           ,TO_CHAR(sys_op_vecbit(a.sys_nc00006$,0)) cbo_ddl
         FROM t1 a
         WHERE a.c_ddl_2 IN (130,150)
         UNION ALL
         SELECT
            a.c_ddl_2
           ,TO_CHAR(sys_op_vecbit(a.sys_nc00006$,0)) cbo_ddl
         FROM t1 a
         WHERE a.c_ddl_2 IS NULL
         UNION ALL
         SELECT
            a.c_ddl_2
           ,TO_CHAR(sys_op_vecbit(a.sys_nc00006$,0)) cbo_ddl
         FROM t1 a
         WHERE c_ddl_2 =84
         AND rownum <=1
            order by c_ddl_2 nulls last
            ;

   C_DDL_2       CBO_DDL
   ----------   ---------
       84        {null}
      130          1
      150          1
    {null}         1 


There are 4 distinct values of C_DDL_2 column, the default one (84) and 3 explicitly inserted values 130,150 and null. When you use a predicate against the C_DDL_2 column to retrieve a row from a table block, the Oracle CBO will decode the above CBO_DDL value (based on SYS_NC00006$) to check its value against your input bind (or literal) variable. As such it can mimic correctly all the values of C_DDL_2 column including those having a default value (84) and which have not been physically updated to reflect this default value.

Summary

Oracle 11gR1 came up with a wonderful feature that makes us not worrying at all about the continuity of our application when adding, online, a not null column with default value to a real life big production table. This feature, called DDL optimization, allows such an alter table to be not only instantaneous but also without any need to lock the table. Oracle 12c extended this feature to include null columns with default value. And the icing on the cake is that there seems to be no noticeable side effect on the performance during the retrieve of the altered column even though the appearance in 12c of an exotic but nevertheless inoffensive predicate part ensuring correct values of altered column when this one is acquired from a table block.

About the Author

Mohamed Houri has a PhD in Fluid Mechanics (Scientific Computing) from the University of Aix-Marseille II, preceded by an engineer diploma in Aeronautics. He has been working with the Oracle database for more than 14 years for different European customers as an independent Oracle Consultant specialized in Tuning and Trouble-shooting Oracle performance problems. Mohamed has also worked with the Naval Architect Society of Japan on the analysis of tsunamis and breaking waves using a powerful signal analysis called Wavelet Transform.