TECHNOLOGY: Ask Tom
On Redefinition, Nature, and TriggersBy Tom Kyte
Our technologist redefines tables, compares keys, and warns about DDL in triggers.
I have a table that contains millions of records, and I need to update it regularly. I want to perform an UPDATE and a COMMIT for every, say, 10,000 records. I don't want to do this in one step, because I may end up with rollback segment issues. Any suggestions?
Well, this is much more complex than it looks. Suppose you do break this large transaction up into many small ones, and halfway through, something, such as an ORA-01555 (snapshot too old) error caused by your committing frequently or a system failure, goes wrong. So now your batch update is partway doneit is "somewhere in the middle"and needs to be restarted. Unless you wrote lots of code to make it restartable, you might have a huge mess on your hands. How do you pick up where you left off?
For example, the problem with code that looks like this:
declare cursor c is select * from t; begin open c; loop fetch c bulk collect into l_data limit 500; ... some process ... forall I in 1 .. l_data.count update t set ... /* using l_data */ commit; exit when c%notfound; end loop; close c; end;
is that the odds of an ORA-01555 are very high, because you are reading the table you are modifying and the SELECT * FROM T must be as of the time when the query was started. As you are modifying this table, the probability of an ORA-01555 goes upand you are the cause of it. The undo you generate with your UPDATE of table T is likely to be needed by your SELECT on table T; however, when you execute a COMMIT, you enable the database to reuse your generated undoand if it does (because the undo retention is set too small or because there is insufficient undo space allocated to hold all of the undo you are generating), you will almost certainly get the ORA-01555 error.
Additionally, when you do run into the ORA-01555 error and the code block fails, how do you restart it? You might need either a column in that table that could tell you if it had been bulk-updated already or another tracking table into which you inserted the primary keys of rows already modified, using "SELECT * FROM T WHERE PK NOT IN (select pk from tracking_table)" or a similar approach to do this.
So you are faced with writing lots of code to accomplish this. My preference would be to do one of the following things:
Let's look at using DBMS_REDEFINITION to create an "updated copy" of a table. The goal is to create a new column that is a concatenation of three existing columns, rid the table of the three existing columns, and sequence the rows on disk by another column (to have the existing data sorted by that column on disk). I'll start with a copy of ALL_OBJECTS to test with, as shown in Listing 1.
Code Listing 1: Initial table T
SQL> create table t 2 as 3 select OWNER, OBJECT_NAME, SUBOBJECT_NAME, 4 OBJECT_ID, DATA_OBJECT_ID, OBJECT_TYPE, 5 CREATED, LAST_DDL_TIME, TIMESTAMP, STATUS, 6 TEMPORARY, GENERATED, SECONDARY 7 from all_objects 8 order by dbms_random.random; Table created. SQL> alter table t 2 add constraint t_pk 3 primary key(object_id); Table altered. SQL> select object_name 2 from t 3 where rownum <= 5 4 / OBJECT_NAME -------------------- java/rmi/MarshalException SYS_C008650 ALL_IDENTIFIERS /5ed18cf1_SimpleAuthPopupBasic /1cca9769_MonitoredObjectImpl
So I have a table T with a constraint (and possibly more, such as grants, indexes, triggers, and so on). As you can see in Listing 1, the data is stored randomly on diskit is definitely not sorted by OBJECT_NAME, as the full table scan I started with a SELECT * FROM T shows. I would like to take three columnsTEMPORARY, GENERATED, and SECONDARYand concatenate them into a new FLAGS column. Further, I would like to "drop" the TEMPORARY, GENERATED, and SECONDARY columns as well as the SUBOBJECT_NAME, DATA_OBJECT_ID, and TIMESTAMP columns from the new table and, last, organize the existing data by OBJECT_NAME. Note that any newly added data will not be stored in sorted order in the table.
To accomplish that, I'll need an interim table for copying the existing data into:
SQL> create table t_interim 2 ( 3 object_id number, 4 object_type varchar2(18), 5 owner varchar2(30), 6 object_name varchar2(30), 7 created date, 8 last_ddl_time date, 9 status varchar2(7), 10 flags varchar2(5) 11 ) 12 / Table created.
Now I am ready to begin the UPDATE, using the DBMS_REDEFINITION packageavailable with Oracle9i Database and aboveto do an online table redefinition, as shown in Listing 2. (The ability to sort the data during a redefinition was added in Oracle Database 10g Release 1.)
Code Listing 2: DBMS_REDEFINITION.START_REDEF_TABLE procedure
SQL> declare 2 l_colmap varchar(512); 3 begin 4 l_colmap := 5 'object_id, 6 object_type, 7 owner, 8 object_name , 9 created, 10 last_ddl_time, 11 status, 12 temporary || ''/'' || 13 generated || ''/'' || 14 secondary flags '; 15 16 dbms_redefinition.start_redef_table 17 ( uname => user, 18 orig_table => 'T', 19 int_table => 'T_INTERIM', 20 orderby_cols => 'OBJECT_NAME', 21 col_mapping => l_colmap ); 22 end; 23 / PL/SQL procedure successfully completed.
The COL_MAPPING parameter in the START_REDEF_TABLE procedure is what does the UPDATE and DROP column magic. You basically execute a SELECT on the data, using the COL_MAPPING parameter, which can include functions (which you would have used in the SET clause of the UPDATE). The ORDERBY_COLS parameter accomplishes the resequencing of the existing table data on disk. If you enable SQL_TRACE=TRUE when executing the call to START_REDEF_TABLE, you will see an INSERT like this being executed:
INSERT /*+ BYPASS_RECURSIVE_CHECK APPEND */ INTO "OPS$TKYTE"."T_INTERIM" ("OBJECT_ID","OBJECT_TYPE","OWNER", "OBJECT_NAME", "CREATED", "LAST_DDL_TIME","STATUS","FLAGS") SELECT "T"."OBJECT_ID", "T"."OBJECT_TYPE", "T"."OWNER","T"."OBJECT_NAME", "T"."CREATED","T"."LAST_DDL_TIME", "T"."STATUS", "T"."TEMPORARY"||'/'|| "T"."GENERATED"||'/'|| "T"."SECONDARY" FROM "OPS$TKYTE"."T" "T" ORDER BY OBJECT_NAME
Note that the TEMPORARY, GENERATED, SECONDARY, SUBOBJECT_NAME, DATA_OBJECT_ID, and TIMESTAMP columns do not get copied to the T_INTERIM table but the TEMPORARY, GENERATED, and SECONDARY columns get concatenated into the new FLAGS column.
Because that INSERT includes APPEND in the hint, you can not only bypass UNDO (which happens by default with APPENDno undo is generated for the T_INTERIM table during this initial load) but can also, if you choose, bypass REDO generation for this table as well by altering it to be NOLOGGING before performing the START_REDEF_TABLE procedure. (If you bypass REDO generation, make sure to coordinate with the group responsible for backups before doing that! They'll need to schedule a backup of the affected data file shortly after this operation to make the new data recoverable).
That START_REDEF_TABLE procedure I just invoked did a couple of things. It copied the data from T to T_INTERIM, copied only the data of interest, sorted the data during the load, and did it all efficiently (bypassing UNDO and optionally REDO). It also set up just enough replication between T and T_INTERIM to enable me to keep them in sync, so that at the end of the redefinition, the two tables are logically equivalentthey have the same number of rows.
Now I need to copy over the dependent "things"such as indexes, constraints, and grants. I can use the COPY_TABLE_DEPENDENTS API call (a feature of Oracle Database 10g and above) to perform this, or I can copy over the table dependents myself, using DDL (along with any option I'd like: NOLOGGING, PARALLEL, and so on). In this example, I use the COPY_TABLE_DEPENDENTS API call:
SQL> variable nerrors number SQL> begin 2 dbms_redefinition.copy_ table_dependents 3 ( user, 'T', 'T_INTERIM', 4 copy_indexes => dbms_ redefinition.cons_orig_params, 5 num_errors => :nerrors ); 6 end; 7 / PL/SQL procedure successfully completed. SQL> print nerrors NERRORS ------ 0
That did it. If you were to query the data dictionary now, you'd see two tables with equivalent constraints, grants, triggers, indexes, and so on. I am ready to finish the redefinition nowit will involve synchronizing the tables and then locking both T and T_INTERIM for a brief moment to swap their object names in the data dictionary so T will become T_INTERIM and T_INTERIM will become T:
SQL> begin 2 dbms_redefinition.finish_ redef_table 3 ( user, 'T', 'T_INTERIM' ); 4 end; 5 / PL/SQL procedure successfully completed. SQL> select object_name, flags 2 from t 3 where rownum <= 5; OBJECT_NAME FLAGS ----------------- --- /1000323d_DelegateInvocationHa N/N/N /1000323d_DelegateInvocationHa N/N/N /1000e8d1_LinkedHashMapValueIt N/N/N /1000e8d1_LinkedHashMapValueIt N/N/N /1005bd30_LnkdConstant N/N/N
As you can see, table T is now the updated table, with the new FLAGS column and with the existing rows in the table sorted on disk by OBJECT_NAME.
As previously mentioned, using CREATE TABLE AS SELECT to select data into a new table could produce a result similar to using DBMS_REDEFINITION, but the process, including the creation of indexes, grants, and constraints, would be more manual. However, in the standard edition of Oracle Database, where online redefinition is not available, using CREATE TABLE AS SELECT might be the most efficient way to accomplish this.
Natural or Surrogate Keys
The project I'm currently working on has the database design rule that all tables must have a surrogate key. This is required even if a perfectly good natural key exists.
The primary motivation appears to be to improve join efficiency by removing the possibility of having to join two tables on more than one column.
Personally I'm not a fan of surrogate keys in general or of this sort of blanket policy in particular. I believe that there is often much to be gained by having the child table inherit the primary key of the parent table as part of the primary key of the child table.
Do you have any comment about natural keys versus surrogate keys in general? Do you favor surrogate keys over natural keys? How tight would performance considerations have to be to justify such a scheme?
Ahh, the age-old debateone that neither side will ever "win." Having a rule such as this is the perfect example of why I don't really like the term best practice , or ROTs (rules of thumb). One person's best practice is another person's nightmare.
If you have a natural key, by all means, use it. The natural key should be immutable and sensible; they are sometimes rare in real life, but they do exist.
For example, if I had a DOCUMENT table and a DOCUMENT_VERSION table, I would definitely use document_id (which might be a surrogate) as the primary key of one table and the combined document_id,version# as the primary key of the other table (and DOCUMENT_VERSION might have an association with authors, so its primary key is a foreign key elsewhere in the system too).
That is, I would set it up like this:
create table document ( document_id number primary key, -- populated by a sequence perhaps other_data... ); create table document_version ( document_id references document, version# number, other_data ... constraint doc_ver_pk primary key (document_id,version#) );
for the reasons you just listed. One thing to be very sure of, however, is that the primary key is immutable. Never changing. In this case, the surrogate key in the DOCUMENT table is immutable and the natural key (which happens to include a surrogate key from something else) in the DOCUMENT_VERSION table is as well.
The natural key would have to be present in my table from the get-go, with NOT NULL and UNIQUE constraints on it. The use of a surrogate key here would only add to the work of all INSERT operationshaving to now generate a surrogate key as well as uniquely constrain both the surrogate key and the natural key. So, if the natural key is both immutable and reasonable, by all means use it. (In this case, reasonable means, for example, that it does not take 17 columns to store the natural keyhaving 2, 3, 4, or maybe even 5 columns is reasonable.) To read the many and varied discussions of this divisive topic, see asktom.oracle.com/pls/ask/search?p_string=%22natural+key%22.
DDL in Triggers
Everything I have ever read about triggers explicitly states that DDL, because of an implicit COMMIT, cannot be used within a trigger. I have, though, seen in blogs and other places that people have claimed to have gotten it to work through various hacks, but I have never been successful in doing so. I have two questions relating to this:
1. Why doesn't using pragma autonomous_transaction solve this within a procedure called from a trigger?
2. Do you know of a workaround for executing DDL in a procedure called from a trigger?
Well, to answer No. 1, pragma autonomous_transaction would permit you to do DDL in a trigger, but thankfully you never succeeded. Be very thankful of that. In answer to No. 2, in the extremely rare case extremely rare case where this is actually desirable and necessary, I suggest using DBMS_JOB to schedule the CREATE statement after your transaction has committed.
First, think for a moment about the ramifications of doing nontransactional work in a trigger. What happens when you need to roll back? Well, of course, the DDL would not roll backyou would be left "halfway there." The DDL would have happened, but the transaction that caused the DDL would not have happened (having been rolled back). You would be left with a mess.
Whenever you are tempted to do something nontransactional in a trigger, think 500 times more about it and then always decide against it. It can lead only to really bad things.
If you use DBMS_JOB, it will look something like this:
SQL> create table do_ddl 2 ( job number primary key, 3 stmt varchar2(4000) 4 ); Table created. SQL> create or replace 2 procedure do_ddl_safely ( p_job in number ) 3 is 4 l_rec do_ddl%rowtype; 5 begin 6 select * 7 into l_rec 8 from do_ddl 9 where job = p_job; 10 execute immediate l_rec.stmt; 11 end; 12 / Procedure created.
And then you will use a block of code similar to the following to invoke the DDL procedure shortly after you commit your transaction:
SQL> declare 2 l_job number; 3 begin 4 dbms_job.submit 5 ( l_job, 'do_ddl_safely(JOB);' ); 6 insert into do_ddl 7 ( job, stmt ) values 8 ( l_job, '...The statement to be executed...' ); 9 end; 10 /
DO_DDL_SAFELY is a stored procedure you write that does DDL, catching errors if needed, notifying people, and correctly doing whatever needs to be done. It will execute shortly after you commit.
And best of all, if you roll back, the INSERT into the job queue does too. You are protectedthe DDL won't happen. Use this approach anytime you are thinking of doing something nontransactional in a trigger.
See this series on a related topic (write consistency), including how triggers might fire more than once for a given statementyet another reason to avoid nontransactional operations in a trigger:
Tom Kyte is a database evangelist in Oracle's Server Technology division and has worked for Oracle since 1993. He is the author of Expert Oracle Database Architecture: 9i and 10g Programming Techniques and Solutions (Apress, 2005) and Effective Oracle by Design (Oracle Press, 2003), among others.