TECHNOLOGY: Ask Tom
On Deferring and Bulking UpBy Tom Kyte
Our technologist prefers not to defer and works to increase bulk.
In your column “On Constraints, Metadata, and Truth,” you wrote about how constraints (or the lack thereof) can really have an impact on query plans. I have a question on constraints: given that you cannot alter an existing constraint to be deferrable—you have to drop it and re-create it as deferrable—would it be OK to create all constraints as deferrable, just in case you wanted to defer them someday?
The short answer is no, you do not want to create all constraints as deferrable “just in case.” The longer answer is still no, but with an explanation.
By default, constraints are implemented in an IMMEDIATE fashion—integrity constraints are checked immediately after the entire SQL statement has been processed.
Since Oracle8 Database, the database program has had the ability to defer constraint checking, which can be quite advantageous for various operations. One that immediately leaps to mind is the requirement to cascade an UPDATE of a primary key to the child keys. Many people claim that you should never need to do this—that primary keys are immutable (I am one of those people)—but many others persist in their desire to have a cascading UPDATE. Deferrable constraints make this possible.
Before Oracle8 Database, it was actually possible to do a CASCADE UPDATE, but doing so involved a tremendous amount of work and had certain limitations. With deferrable constraints in Oracle8 Database and later releases, it becomes almost trivial. The code can look like this:
SQL> create table parent 2 (pk int primary key ) 3 / Table created. SQL> create table child 2 (fk constraint child_fk_parent 3 references parent(pk) 4 deferrable 5 initially immediate 6 ) 7 / Table created. SQL> insert into parent values ( 1 ); 1 row created. SQL> insert into child values ( 1 ); 1 row created.
The code creates a parent table, PARENT, and a child table, CHILD. The CHILD table references the PARENT table, and the constraint used to enforce that rule is called CHILD_FK_PARENT (child foreign key to parent). This constraint was created as DEFERRABLE, but it was set to INITIALLY IMMEDIATE. This means that the constraint can be deferred until the COMMIT or to some other time. By default, however, the constraint will be validated at the statement level. This is the most common use of deferrable constraints. As defined, the CHILD table behaves the same way tables always have, but the deferrable constraint makes it possible to explicitly change the table’s behavior.
Now let’s try some data manipulation language and see what happens:
SQL> update parent set pk = 2; update parent set pk = 2 * ERROR at line 1: ORA-02292: integrity constraint (OPS$TKYTE.CHILD_FK_PARENT) violated – childrecord found
Because the constraint is in IMMEDIATE mode, this UPDATE fails. I’ll change the mode and try again:
SQL> set constraint child_fk_parent deferred; Constraint set. SQL> update parent set pk = 2; 1 row updated.
Now the UPDATE succeeds.
For demonstration purposes, I’ll show how to check a deferred constraint explicitly before committing it, to see if the statement’s modifications are in agreement with the business rules (in other words, to check that the constraint isn’t currently being violated). It’s a good idea to do this check before releasing control to some other part of the program (which may not be expecting the deferred constraints) or committing:
SQL> set constraint child_fk_parent immediate; set constraint child_fk_parent immediate * ERROR at line 1: ORA-02291: integrity constraint (OPS$TKYTE.CHILD_FK_PARENT) violated – parent key not found
The SET CONSTRAINT fails and returns an error immediately, as expected, because the constraint was violated. The UPDATE to PARENT was not rolled back (that would have violated the statement-level atomicity); it is still outstanding. Also note that the transaction is still working with the CHILD_FK_PARENT constraint deferred because the SET CONSTRAINT statement failed.
Let’s continue now by cascading the UPDATE to CHILD:
SQL> update child set fk = 2; 1 row updated. SQL> set constraint child_fk_parent immediate; Constraint set. SQL> commit; Commit complete.
And that’s the way it works. Note that to defer a constraint, you must create it that way—you have to drop and re-create the constraint to change it from NONDEFERRABLE to DEFERRABLE. That might lead you to believe that you should create all of your constraints as DEFERRABLE INITIALLY IMMEDIATE, just in case you want to defer them at some point. In general, however, that is not true. You want to allow constraints to be deferred only if you have a real need to do so. By creating deferred constraints, you introduce differences in the physical implementation (the structure of your data) that might not be obvious. For example, if you create a deferrable UNIQUE or PRIMARY KEY constraint, the index Oracle Database creates to support the enforcement of that constraint will be a nonunique index. Normally you expect a unique index to enforce a unique constraint, but because you have specified that the constraint can temporarily be ignored, the database can’t use that unique index. Other subtle changes will also be apparent, for example, with NOT NULL constraints. If you allow your NOT NULL constraints to be DEFERRABLE, the optimizer will start treating the column as if it supported NULLs—because it, in fact, does support NULLs during your transaction.
Code Listing 1: Creating a table with a DEFERRABLE constraint and inserting data
SQL> create table t 2 ( x int constraint x_not_null not null deferrable, 3 y int constraint y_not_null not null, 4 z varchar2(30) 5 ); Table created. SQL> insert into t(x,y,z) 2 select rownum, rownum, rpad('x',30,'x') 3 from all_users; 45 rows created. SQL> exec dbms_stats.gather_table_stats( user, 'T' ); PL/SQL procedure successfully completed.
To demonstrate this, suppose you have a table T with the columns and data shown in Listing 1. In this example, column X is created such that when you do a COMMIT, X will not be NULL. However, during your transaction, X is allowed to be NULL, because the constraint is DEFERRABLE. Column Y, on the other hand, is always NOT NULL. If you were to index column Y—
SQL> create index t_idx on t(y); Index created.
—and then run a query to count rows that could make use of this index on Y, but only if Y is NOT NULL, as in the following query—
SQL> select count(*) from t; ——————————————————————————————————— |Id| Operation | Name| Rows| ——————————————————————————————————— | 0| SELECT STATEMENT | | 1| | 1| SORT AGGREGATE | | 1| | 2| INDEX FULL SCAN |T_IDX| 45| ———————————————————————————————————
—you would be happy to see that the optimizer chose to use the small index on Y to count the rows rather than to do a full scan of the entire table T. However, if you dropped that index and indexed column X instead—
SQL> drop index t_idx; Index dropped. SQL> create index t_idx on t(x); Index created.
—and then ran the query to count the rows once more, you would discover that the database does not and, in fact, cannot use your index:
SQL> select count(*) from t; ———————————————————————————————————— |Id| Operation | Name| Rows| ———————————————————————————————————— | 0| SELECT STATEMENT | | 1| | 1| SORT AGGREGATE | | 1| | 2| TABLE ACCESS FULL| T | 45| ————————————————————————————————————
The query did a full scan of the table in order to count the rows. This is because in an Oracle Database B-tree index, the index will not contain an entry for any row in the table such that all of the columns in the index are NULL. Given that X is allowed to be NULL temporarily, the optimizer has to assume that X might be NULL and therefore would not be in the index on X. Hence, a count returned from the index might be different from (wrong in terms of) a count against the table.
You can see that if X had a NONDEFERRABLE constraint placed on it, this limitation would be removed; that is, column X is as good as column Y if the NOT NULL constraint is not DEFERRABLE:
SQL> alter table t drop constraint x_not_null; Table altered. SQL> alter table t modify x constraint x_not_null not null; Table altered. SQL> set autotrace traceonly explain SQL> select count(*) from t; ——————————————————————————————————— |Id| Operation | Name| Rows| ——————————————————————————————————— | 0| SELECT STATEMENT| | 1| | 1| SORT AGGREGATE | | 1| | 2| INDEX FULL SCAN| T_IDX| 45| ———————————————————————————————————
So the bottom line is: Use DEFERRABLE constraints only when you have an identified need to use them. They introduce subtle side effects that can cause differences in your physical database implementation (nonunique versus unique indexes) or query plans—as just demonstrated.
In fact, I would generalize this advice: Use default settings, and deviate from the default settings only when you have a sound reason and on a case-by-case basis.
You have often said that if you can use a simple SQL statement, don’t complicate it by writing procedural code. I have the following FOR LOOP UPDATE, which takes a very long time (of course) and consumes a lot of resources. I’d like to simplify it by getting rid of the FOR LOOP, but there are other statements inside the FOR LOOP besides the UPDATE statement. [Note: I edited this code for brevity. See bit.ly/gDONzd for the original code.]
FOR increc IN (SELECT * FROM big_table ORDER BY many columns) LOOP . . . much procedural code goes here. . . . UPDATE big_table SET … WHERE primary_key = inrec.primary_key; COMMIT; END LOOP;
The big_table table has 27 million rows. Can you please give me some suggestions to make this a better process?
I can outline the first step you would take: bulking it up with BULK COLLECT. It would look like the code in Listing 2.
Code Listing 2: Step 1 to improving slow-by-slow processing—bulking up
declare cursor c is select rowid rid, t.* from big_table t order by many columns; l_limit number := 1000; begin loop -- l_arrayN is a plsql index by table corresponding -- to your columnN - declare of the type of columnN fetch c bulk collect into L_ROWID_ARRAY, l_array1, l_array2, l_array3 ... LIMIT l_limit; for i in 1 .. l_rowid_array.count loop -- the current body of your loop goes here, but -- instead of an UPDATE, you would assign the -- values to more arrays (the set columns of your -- current update), eg: vrep_array(i) := vrep; vpaid_array(i) := vpaid; ... and so on ... end loop; forall i in 1 .. l_rowid_array.count update big_table set reported = v_rep_array(i), paid = v_paid_array, ... where rowid = l_rowid_array(i); exit when c%notfound; end loop;
That’ll fetch 1,000 rows, process the data, and then bulk-update 1,000 rows back into the table (by rowid—and with no index access).
That is your first step. (There was a ton of procedural code in the original statement—far too much for me to sit down and rewrite in a couple of minutes as a single SQL statement.) This first step will accomplish a few things:
To show the net effect of this last improvement, I’ll present a bit of material I use in an “Efficient PL/SQL” presentation. Among other things, I discuss the reasons for bulking up and when to bulk up versus when not to.
In short, if you are in PL/SQL, retrieving data from the database, and not sending it back to the database, you do not need to bulk-collect. The following bit of code is a perfect example:
for x in (select * from t where …) loop dbms_output.put_line( … t.x … ); end loop;
In this example, I retrieve data from the database via the SELECT statement but I don’t go back to the database to INSERT/UPDATE/DELETE some bit of data, row by row. The SELECT statement is already using bulk processing—even if it doesn’t look like it. Since Oracle Database 10g, the “for x in (select …)” construct has been optimized to silently array-fetch 100 rows at a time. So all the SQL in the preceding code snippet already employs bulk processing.
This code snippet—
for x in ( select * from t where … ) loop process(x); update t set … where …; end loop;
—needs some work, however. It is doing array fetching for the SELECT but not array processing for the UPDATE. I need to revise the code to do explicit array processing. As a more concrete example, let’s look at the stored procedure in Listing 3.
Code Listing 3: The slow_by_slow stored procedure
create or replace procedure slow_by_slow as begin for x in (select rowid rid, object_name from t t_slow_by_slow) loop x.object_name := substr(x.object_name,2) ||substr(x.object_name,1,1); update t set object_name = x.object_name where rowid = x.rid; end loop; end;
It reads every record in table T and performs some processing on it. It then updates the row back in the table by ROWID, but it does so with “slow by slow”—also known as “row by row”—processing. The optimal way to write that code would be to use the bulk procedure in Listing 4.
Code Listing 4: The preferred bulk stored procedure
create or replace procedure bulk as type ridArray is table of rowid; type onameArray is table of object_name%type; cursor c is select rowid rid, object_name from t t_bulk; l_rids ridArray; l_onames onameArray; N number := 100; begin open c; loop fetch c bulk collect into l_rids, l_onames limit N; for i in 1 .. l_rids.count loop l_onames(i) := substr(l_onames(i),2) ||substr(l_onames(i),1,1); end loop; forall i in 1 .. l_rids.count update t set object_name = l_onames(i) where rowid = l_rids(i); exit when c%notfound; end loop; close c; end;
Here I explicitly array-fetch a configurable number of rows (I used N with a default of 100 to set the array size), and once I get N rows, I process them. Then I take the 100 processed rows and bulk-update them back into the database, using the FORALL statement with the UPDATE. The results can be astounding: I ran the slow_by_slow and bulk procedures with SQL_TRACE enabled, and Listing 5 shows the TKPROF report.
Code Listing 5: TKPROF for slow_by_slow and bulk procedures
Slow-by-slow processing in the slow_by_slow procedure UPDATE T SET OBJECT_NAME = :B2 WHERE ROWID = :B1 call count cpu elapsed disk query current rows ———————— ————————— —————————— ———————— ————————— ———————— ————————— ———————— Parse 1 0.00 0.00 0 0 0 0 Execute 71824 21.25 22.25 0 71836 73950 71824 Fetch 0 0.00 0.00 0 0 0 0 ———————— ————————— —————————— ———————— ————————— ———————— ————————— ———————— total 71825 21.25 22.25 0 71836 73950 71824 Bulk processing in the bulk procedure UPDATE T SET OBJECT_NAME = :B1 WHERE ROWID = :B2 call count cpu elapsed disk query current rows ———————— ————————— —————————— ———————— ————————— ———————— ————————— ———————— Parse 1 0.00 0.00 0 0 0 0 Execute 719 12.83 13.77 0 71853 74185 71825 Fetch 0 0.00 0.00 0 0 0 0 ———————— ————————— —————————— ———————— ————————— ———————— ————————— ———————— total 720 12.83 13.77 0 71853 74185 71825
Note the CPU drop from 21.25 down to 12.83—that is all due to the array processing. Instead of “bugging” the database kernel 71,824 times, the UPDATE sent it data only 719 times. That makes the switching from PL/SQL to the SQL layer that much more efficient.
As a side note, I encourage you to spend some time reverse-engineering your code, writing a specification for it—as a set of requirements—and then attempting to develop a single SQL statement. If I did that for my example above, the UPDATE would look something like
UPDATE t SET object_name = SUBSTR(object_name,2) || SUBSTR(object_name,1,1)
Listing 6 shows the TKPROF report. That is probably what you need for your 27 million rows: something that runs 10 to 20 times as fast as what you already have (if not faster!).
Code Listing 6: TKPROF for the single SQL statement
UPDATE T SET OBJECT_NAME = SUBSTR(OBJECT_NAME,2) || SUBSTR(OBJECT_NAME,1,1) call count cpu elapsed disk query current rows ———————— ————————— —————————— ———————— ————————— ———————— ————————— ———————— Parse 1 0.00 0.00 0 0 0 0 Execute 1 1.30 1.44 0 2166 75736 71825 Fetch 0 0.00 0.00 0 0 0 0 ———————— ————————— —————————— ———————— ————————— ———————— ————————— ———————— total 2 1.30 1.44 0 2166 75736 71825
Tom Kyte is a database evangelist in Oracle’s Server Technologies division and has worked for Oracle since 1993. He is the author of Expert Oracle Database Architecture (Apress, 2005, 2010) and Effective Oracle by Design (Oracle Press, 2003), among other books.
Send us your comments