TECHNOLOGY: Ask Tom

As Published In
Oracle Magazine
May/June 2011

  

On Deferring and Bulking Up

By Tom Kyte Orace Ace Employee

 

Our technologist prefers not to defer and works to increase bulk.

In your column “On Constraints, Metadata, and Truth,” you wrote about how constraints (or the lack thereof) can really have an impact on query plans. I have a question on constraints: given that you cannot alter an existing constraint to be deferrable—you have to drop it and re-create it as deferrable—would it be OK to create all constraints as deferrable, just in case you wanted to defer them someday?

The short answer is no, you do not want to create all constraints as deferrable “just in case.” The longer answer is still no, but with an explanation.

By default, constraints are implemented in an IMMEDIATE fashion—integrity constraints are checked immediately after the entire SQL statement has been processed.

Since Oracle8 Database, the database program has had the ability to defer constraint checking, which can be quite advantageous for various operations. One that immediately leaps to mind is the requirement to cascade an UPDATE of a primary key to the child keys. Many people claim that you should never need to do this—that primary keys are immutable (I am one of those people)—but many others persist in their desire to have a cascading UPDATE. Deferrable constraints make this possible.

Before Oracle8 Database, it was actually possible to do a CASCADE UPDATE, but doing so involved a tremendous amount of work and had certain limitations. With deferrable constraints in Oracle8 Database and later releases, it becomes almost trivial. The code can look like this: 

SQL> create table parent
 2   (pk int primary key )
 3   /
Table created.
SQL> create table child
 2   (fk constraint child_fk_parent
 3      references parent(pk)
 4      deferrable
 5      initially immediate
 6   )
 7   /
Table created.

SQL> insert into parent values ( 1 );
1 row created.

SQL> insert into child values ( 1 );
1 row created.

 

The code creates a parent table, PARENT, and a child table, CHILD. The CHILD table references the PARENT table, and the constraint used to enforce that rule is called CHILD_FK_PARENT (child foreign key to parent). This constraint was created as DEFERRABLE, but it was set to INITIALLY IMMEDIATE. This means that the constraint can be deferred until the COMMIT or to some other time. By default, however, the constraint will be validated at the statement level. This is the most common use of deferrable constraints. As defined, the CHILD table behaves the same way tables always have, but the deferrable constraint makes it possible to explicitly change the table’s behavior.

Now let’s try some data manipulation language and see what happens: 

SQL> update parent set pk = 2;
update parent set pk = 2
*
ERROR at line 1:
ORA-02292: integrity constraint 
(OPS$TKYTE.CHILD_FK_PARENT) violated –
childrecord found

Because the constraint is in IMMEDIATE mode, this UPDATE fails. I’ll change the mode and try again:

SQL> set constraint 
child_fk_parent deferred;
Constraint set.
            
SQL> update parent set pk = 2;
1 row updated.

 

Now the UPDATE succeeds.

For demonstration purposes, I’ll show how to check a deferred constraint explicitly before committing it, to see if the statement’s modifications are in agreement with the business rules (in other words, to check that the constraint isn’t currently being violated). It’s a good idea to do this check before releasing control to some other part of the program (which may not be expecting the deferred constraints) or committing: 

SQL> set constraint 
child_fk_parent immediate;
set constraint child_fk_parent immediate
*
ERROR at line 1:
ORA-02291: integrity constraint 
(OPS$TKYTE.CHILD_FK_PARENT) violated –
parent key not found

 

The SET CONSTRAINT fails and returns an error immediately, as expected, because the constraint was violated. The UPDATE to PARENT was not rolled back (that would have violated the statement-level atomicity); it is still outstanding. Also note that the transaction is still working with the CHILD_FK_PARENT constraint deferred because the SET CONSTRAINT statement failed.

Let’s continue now by cascading the UPDATE to CHILD: 

SQL> update child set fk = 2;
1 row updated.

SQL> set constraint 
child_fk_parent immediate;
Constraint set.

SQL> commit;
Commit complete.

 

And that’s the way it works. Note that to defer a constraint, you must create it that way—you have to drop and re-create the constraint to change it from NONDEFERRABLE to DEFERRABLE. That might lead you to believe that you should create all of your constraints as DEFERRABLE INITIALLY IMMEDIATE, just in case you want to defer them at some point. In general, however, that is not true. You want to allow constraints to be deferred only if you have a real need to do so. By creating deferred constraints, you introduce differences in the physical implementation (the structure of your data) that might not be obvious. For example, if you create a deferrable UNIQUE or PRIMARY KEY constraint, the index Oracle Database creates to support the enforcement of that constraint will be a nonunique index. Normally you expect a unique index to enforce a unique constraint, but because you have specified that the constraint can temporarily be ignored, the database can’t use that unique index. Other subtle changes will also be apparent, for example, with NOT NULL constraints. If you allow your NOT NULL constraints to be DEFERRABLE, the optimizer will start treating the column as if it supported NULLs—because it, in fact, does support NULLs during your transaction.

Code Listing 1: Creating a table with a DEFERRABLE constraint and inserting data 

SQL> create table t
 2   ( x int constraint x_not_null not null deferrable,
 3   y int constraint y_not_null not null,
 4   z varchar2(30)
 5   );
Table created.

SQL> insert into t(x,y,z)
 2   select rownum, rownum, rpad('x',30,'x')
 3   from all_users;
45 rows created.

SQL> exec dbms_stats.gather_table_stats( user, 'T' );
PL/SQL procedure successfully completed.

 

To demonstrate this, suppose you have a table T with the columns and data shown in Listing 1. In this example, column X is created such that when you do a COMMIT, X will not be NULL. However, during your transaction, X is allowed to be NULL, because the constraint is DEFERRABLE. Column Y, on the other hand, is always NOT NULL. If you were to index column Y— 

SQL> create index t_idx on t(y);
Index created.

 

—and then run a query to count rows that could make use of this index on Y, but only if Y is NOT NULL, as in the following query— 

SQL> select count(*) from t;
———————————————————————————————————
|Id| Operation        | Name| Rows|
———————————————————————————————————
| 0| SELECT STATEMENT |     |    1|
| 1| SORT AGGREGATE   |     |    1|
| 2|  INDEX FULL SCAN |T_IDX|   45|
———————————————————————————————————

 

—you would be happy to see that the optimizer chose to use the small index on Y to count the rows rather than to do a full scan of the entire table T. However, if you dropped that index and indexed column X instead— 

SQL> drop index t_idx;
Index dropped.

SQL> create index t_idx on t(x);
Index created.

 

—and then ran the query to count the rows once more, you would discover that the database does not and, in fact, cannot use your index: 

SQL> select count(*) from t;
————————————————————————————————————
|Id| Operation         | Name| Rows|
————————————————————————————————————
| 0| SELECT STATEMENT  |     |    1|
| 1| SORT AGGREGATE    |     |    1|
| 2|  TABLE ACCESS FULL| T   |   45|
————————————————————————————————————

 

The query did a full scan of the table in order to count the rows. This is because in an Oracle Database B-tree index, the index will not contain an entry for any row in the table such that all of the columns in the index are NULL. Given that X is allowed to be NULL temporarily, the optimizer has to assume that X might be NULL and therefore would not be in the index on X. Hence, a count returned from the index might be different from (wrong in terms of) a count against the table.

You can see that if X had a NONDEFERRABLE constraint placed on it, this limitation would be removed; that is, column X is as good as column Y if the NOT NULL constraint is not DEFERRABLE: 

SQL> alter table t drop 
constraint x_not_null;
Table altered.

SQL> alter table t modify x 
constraint x_not_null not null;
Table altered.

SQL> set autotrace traceonly explain 
SQL> select count(*) from t;
———————————————————————————————————
|Id| Operation       |  Name| Rows|
———————————————————————————————————
| 0| SELECT STATEMENT|      |    1|
| 1| SORT AGGREGATE  |      |    1|
| 2|  INDEX FULL SCAN| T_IDX|   45|
———————————————————————————————————

 

So the bottom line is: Use DEFERRABLE constraints only when you have an identified need to use them. They introduce subtle side effects that can cause differences in your physical database implementation (nonunique versus unique indexes) or query plans—as just demonstrated.

In fact, I would generalize this advice: Use default settings, and deviate from the default settings only when you have a sound reason and on a case-by-case basis. 

Bulking Up

You have often said that if you can use a simple SQL statement, don’t complicate it by writing procedural code. I have the following FOR LOOP UPDATE, which takes a very long time (of course) and consumes a lot of resources. I’d like to simplify it by getting rid of the FOR LOOP, but there are other statements inside the FOR LOOP besides the UPDATE statement. [Note: I edited this code for brevity. See bit.ly/gDONzd for the original code.]  

 FOR increc IN 
 (SELECT * 
  FROM big_table
  ORDER BY many columns) 
LOOP 
      . . . much procedural code goes here. . . .
   UPDATE big_table
SET … 
WHERE primary_key = inrec.primary_key;
   COMMIT; 
END LOOP; 

The big_table table has 27 million rows. Can you please give me some suggestions to make this a better process?

 

I can outline the first step you would take: bulking it up with BULK COLLECT. It would look like the code in Listing 2.

Code Listing 2: Step 1 to improving slow-by-slow processing—bulking up 

declare

cursor c is 
select rowid rid, t.* 
  from big_table t
 order by many columns;

l_limit number := 1000;

begin

loop
    -- l_arrayN is a plsql index by table corresponding 
    -- to your columnN - declare of the type of columnN
    fetch c bulk collect 
     into L_ROWID_ARRAY, 
          l_array1, l_array2, 
          l_array3 ...
    LIMIT l_limit;

    for i in 1 .. l_rowid_array.count
    loop
        -- the current body of your loop goes here, but 
        -- instead of an UPDATE, you would assign the 
        -- values to more arrays (the set columns of your 
        -- current update), eg:

        vrep_array(i) := vrep;
        vpaid_array(i) := vpaid;
        ... and so on ...
    end loop;

    forall i in 1 .. l_rowid_array.count
        update big_table
           set reported = v_rep_array(i), 
               paid = v_paid_array, 
               ...
         where rowid = l_rowid_array(i);

    exit when c%notfound;
end loop;

 

Next Steps


 ASK Tom
Tom Kyte answers your most difficult technology questions. Highlights from that forum appear in this column.
 

READ more Tom
 Oracle Database Concepts 11g Release 2 (11.2)
 Expert Oracle Database Architecture: Oracle Database Programming 9i, 10g, and 11g Techniques and Solutions, Second Edition)

 DOWNLOAD Oracle Database 11g Release 2

That’ll fetch 1,000 rows, process the data, and then bulk-update 1,000 rows back into the table (by rowid—and with no index access).

That is your first step. (There was a ton of procedural code in the original statement—far too much for me to sit down and rewrite in a couple of minutes as a single SQL statement.) This first step will accomplish a few things: 

  • The COMMIT after each row is removed. Your original code was not restartable. If it failed halfway through due to an ORA-1555 or any other error, your data would be left in a very inconsistent state from which you would not be able to recover. The data would be logically corrupt. Not only that, but committing after every row is also slow.  

  • The update is done by ROWID instead of via a primary key lookup. With your original code, you perform 27,000,000 INDEX UNIQUE SCANS on your primary key. That is at least three, if not four or five, I/Os performed for each lookup! This ROWID optimization alone will probably save you around 100,000,000 I/Os.  

  • The code fetches and updates 1,000 records at a time. It eliminates 999 out of every 1,000 round-trips to the server. (Even though we are in PL/SQL, we still have “round-trips” to the server. They do not go over the network, but they do draw on CPU and other processing resources.)

     

To show the net effect of this last improvement, I’ll present a bit of material I use in an “Efficient PL/SQL” presentation. Among other things, I discuss the reasons for bulking up and when to bulk up versus when not to.

In short, if you are in PL/SQL, retrieving data from the database, and not sending it back to the database, you do not need to bulk-collect. The following bit of code is a perfect example: 

for x in (select * from t where …)
loop
   dbms_output.put_line( … t.x … );
end loop;

 

In this example, I retrieve data from the database via the SELECT statement but I don’t go back to the database to INSERT/UPDATE/DELETE some bit of data, row by row. The SELECT statement is already using bulk processing—even if it doesn’t look like it. Since Oracle Database 10g, the “for x in (select …)” construct has been optimized to silently array-fetch 100 rows at a time. So all the SQL in the preceding code snippet already employs bulk processing.

This code snippet— 

for x in ( select * from t where … )
loop
process(x);
   update t set … where …;
end loop;

 

—needs some work, however. It is doing array fetching for the SELECT but not array processing for the UPDATE. I need to revise the code to do explicit array processing. As a more concrete example, let’s look at the stored procedure in Listing 3.

Code Listing 3: The slow_by_slow stored procedure 

create or replace procedure slow_by_slow
as
begin
    for x in (select rowid rid, object_name
                from t t_slow_by_slow)
    loop
        x.object_name := substr(x.object_name,2)
                      ||substr(x.object_name,1,1);
        update t 
           set object_name = x.object_name
         where rowid = x.rid;
    end loop;
end;

 

It reads every record in table T and performs some processing on it. It then updates the row back in the table by ROWID, but it does so with “slow by slow”—also known as “row by row”—processing. The optimal way to write that code would be to use the bulk procedure in Listing 4.

 

Code Listing 4: The preferred bulk stored procedure 

create or replace procedure bulk
as
    type ridArray is table of rowid;
    type onameArray is table 
            of object_name%type;

    cursor c is select rowid rid, object_name
                  from t t_bulk;
    l_rids     ridArray;
    l_onames   onameArray;
    N          number := 100;
begin
    open c;
    loop
        fetch c bulk collect 
        into l_rids, l_onames limit N;
        for i in 1 .. l_rids.count
        loop
           l_onames(i) := substr(l_onames(i),2)
                       ||substr(l_onames(i),1,1);
        end loop;
        forall i in 1 .. l_rids.count
           update t 
              set object_name = l_onames(i) 
            where rowid = l_rids(i);
        exit when c%notfound;
    end loop;
    close c;
end;

 

Here I explicitly array-fetch a configurable number of rows (I used N with a default of 100 to set the array size), and once I get N rows, I process them. Then I take the 100 processed rows and bulk-update them back into the database, using the FORALL statement with the UPDATE. The results can be astounding: I ran the slow_by_slow and bulk procedures with SQL_TRACE enabled, and Listing 5 shows the TKPROF report.

Code Listing 5: TKPROF for slow_by_slow and bulk procedures 

Slow-by-slow processing in the slow_by_slow procedure
UPDATE T SET OBJECT_NAME = :B2 WHERE ROWID = :B1

call     count     cpu        elapsed       disk    query   current     rows
———————— ————————— —————————— ———————— ————————— ———————— ————————— ————————
Parse            1       0.00     0.00         0        0         0        0
Execute      71824      21.25    22.25         0    71836     73950    71824
Fetch            0       0.00     0.00         0        0         0        0
———————— ————————— —————————— ———————— ————————— ———————— ————————— ————————
total        71825      21.25    22.25         0    71836     73950    71824
Bulk processing in the bulk procedure
UPDATE T SET OBJECT_NAME = :B1 WHERE ROWID = :B2

call     count     cpu        elapsed       disk    query   current     rows
———————— ————————— —————————— ———————— ————————— ———————— ————————— ————————
Parse            1       0.00     0.00         0        0         0        0
Execute        719      12.83    13.77         0    71853     74185    71825
Fetch            0       0.00     0.00         0        0         0        0
———————— ————————— —————————— ———————— ————————— ———————— ————————— ————————
total          720      12.83    13.77         0    71853     74185    71825

 

Note the CPU drop from 21.25 down to 12.83—that is all due to the array processing. Instead of “bugging” the database kernel 71,824 times, the UPDATE sent it data only 719 times. That makes the switching from PL/SQL to the SQL layer that much more efficient.

As a side note, I encourage you to spend some time reverse-engineering your code, writing a specification for it—as a set of requirements—and then attempting to develop a single SQL statement. If I did that for my example above, the UPDATE would look something like 

UPDATE t SET object_name = SUBSTR(object_name,2) || SUBSTR(object_name,1,1)

 

Listing 6 shows the TKPROF report. That is probably what you need for your 27 million rows: something that runs 10 to 20 times as fast as what you already have (if not faster!).

Code Listing 6: TKPROF for the single SQL statement 

UPDATE T SET OBJECT_NAME = SUBSTR(OBJECT_NAME,2) || SUBSTR(OBJECT_NAME,1,1)

call         count       cpu   elapsed      disk   query    current     rows
———————— ————————— —————————— ———————— ————————— ———————— ————————— ————————
Parse            1       0.00     0.00         0        0         0        0
Execute          1       1.30     1.44         0     2166     75736    71825
Fetch            0       0.00     0.00         0        0         0        0
———————— ————————— —————————— ———————— ————————— ———————— ————————— ————————
total            2       1.30     1.44         0     2166     75736    71825

 


Tom Kyte Headshot


Tom Kyte is a database evangelist in Oracle’s Server Technologies division and has worked for Oracle since 1993. He is the author of Expert Oracle Database Architecture (Apress, 2005, 2010) and Effective Oracle by Design (Oracle Press, 2003), among other books.

 

 



Send us your comments