By Tom Kyte
Our expert partitions, looks up with a PL/SQL table, and does everything with referential integrity.
We would like to convert all our regular tables into partitioned tables. These regular tables might have millions of rows. The following information shows what we have now:
TableName Structure --------- ---------------- TABLE_1 TABLE_2 Same as TABLE_1 TABLE_3 Same as TABLE_1 TABLE_4 TABLE_5 Same as TABLE_4 TABLE_6 Same as TABLE_4 TABLE_7 Same as TABLE_4 TABLE_8 Same as TABLE_4 TABLE_9 TABLE_10 Same as TABLE_1 TABLE_11 Same as TABLE_1 TABLE_12 ... ...
As you can see, several tables have the same structure as other tables. Now, we want partitioned tables as follows:
TableName Comments --------- ------------------------ TABLE_1 will also have TABLE_2, TABLE_3, TABLE_10 & TABLE_11 data TABLE_4 will also have TABLE_5, TABLE_6, TABLE_7 & TABLE_8 data TABLE_12 only its data
How do we convert all regular tables into partitioned tables? Note that we are using one of the date columns as a partition key. What is the fastest and safest way to migrate?
This is pretty easy to do. In the following, pretend that t1, t2 , and t3 are your existing tables. Table t is the table you want (you can drop t1 and rename t to t1 later if needed). I'll use the following approach:
SQL> create table t1 as select sysdate dt, all_objects.* from all_objects; Table created. SQL> create table t2 as select add_months(sysdate,-12) dt, all_objects.* from all_objects; Table created. SQL> create table t3 as select add_months(sysdate,-24) dt, all_objects.* from all_objects; Table created. SQL> create table t (dt, OWNER, OBJECT_NAME, SUBOBJECT_NAME,OBJECT_ID, DATA_OBJECT_ID, OBJECT_TYPE, CREATED, LAST_DDL_TIME, TIMESTAMP, STATUS, TEMPORARY, GENERATED, SECONDARY) partition by range(dt) ( partition part2000 values less than (to_date('01-jan-2001','dd-mon-yyyy')), partition part2001 values less than (to_date('01-jan-2002','dd-mon-yyyy')), partition part2002 values less than (to_date('01-jan-2003','dd-mon-yyyy')) ) as select sysdate dt, all_objects.* from all_objects where 1=0; Table created.
I just created an empty partitioned table, partitioned by date, that is structured exactly like your existing table. The ranges are such that t1 fits in one partition, t2 in another, and so on. Now, what you want to do is swap that empty partition part2000 with your full table t1 , as follows:
SQL> alter table t 2 exchange partition part2000 3 with table t3 4 without validation 5 / Table altered.
Do the same for the remaining table/ partition pairs: swap t2 with part2001 and t3 with part2002 . Now, t is a table with partitions formerly known as tables t1, t2 , and t3.
I was told that instead of repeatedly querying a lookup table for lookup values (like querying a description table for description using description_id), it is much more efficient to load a PL/SQL table and do a lookup on it. Can you show me how we can load a PL/SQL table? How do you perform a lookup using a PL/SQL table? If I want to load a PL/SQL table with values from more than just columns, can I do it? Aren't PL/SQL tables single-dimensional?
The answer to this is "it all depends." Let's say you have the following code:
For x in ( select * from t ) Loop select description into l_description from lookup where id = x.description_id; insert into another_tableÉ
Given this code, using a PL/SQL lookup table might be faster; however, it will not be as fast as getting the correct answer! Following my mantra of "if you can do it in a single SQL statement, do it," you should use a join. The following example compares an insert with and without a lookup table:
SQL> create table lookup 2 (id primary key, 3 description ) 4 organization index 5 as 6 select object_id, object_name 7 from all_objects; Table created.
That will be the lookup table: a table of codes ( id ) for descriptions. In a moment, you'll see how to load that into a PL/SQL index by table with which to perform lookups. But first, I'll generate another table of test data—the input data on which I want to do lookups:
SQL> create table t 2 as 3 select object_id 4 from all_objects; Table created.
And then I'll create a table to "load into." I'll be simulating a typical data load, which includes reading some data, doing lookups, and loading into another table:
SQL> create table target 2 (id number, descript varchar2(30)); Table created.
I'll start with the PL/SQL lookup table approach. First, I have to load up the PL/SQL table type; that is accomplished in lines 8 through 12. Then, I iterate over the data to be processed, do the lookup, and insert it into the target table:
SQL> declare 2 type lookupTableType is table of 3 lookup.description%type index by 4 binary_integer; 5 6 l_lookup lookupTableType; 7 begin 8 for x in (select * from lookup) 9 loop 10 l_lookup(x.id) := 11 x.description; 12 end loop; 13 14 for x in (select * from t) 15 loop 16 begin 17 insert into target 18 values 19 ( x.object_id, 20 l_lookup(x.object_id) ); 21 exception 22 when no_data_found then 23 insert into target 24 values 25 ( x.object_id, null ); 26 end; 27 end loop; 28 end; 29 / PL/SQL procedure successfully completed. Elapsed: 00:00:13.24
As you can see, it took a little over 13 seconds to do the lookup. Can we do better? Definitely:
SQL> begin 2 insert into target 3 select t.object_id, l.description 4 from t, lookup l 5 where t.object_id = l.id(+); 6 end; 7 / PL/SQL procedure successfully completed. Elapsed: 00:00:00.43
This single insert does the same exact thing in a mere fraction of the time. Almost any time you can remove procedural code and do the same thing in a single SQL statement, you'll be better off. There's less code to type and maintain, and it's generally much faster, to boot.
We have a table, city , consisting of different cities where our clients' offices are located. The city table consists of metadata: column values that are not likely to change at all. We have some employee allowances calculated based on the city where the employee works.
We have a Visual Basic form for inserting employee-related details. We have a foreign key on the city column in the employee table, and the parent key is the city table.
One of our consultants recommended that we discard checking the validity of the city entered and maintain all city validation checking through front-end coding. The reasons cited were that referential integrity checking at the back end is too time-consuming and slows down the data-entry processing jobs.
Initially, we would have a lot of data entry, and the number of cities in the city table is 25. I wasn't convinced by the consultant's recommendations. Is the argument valid?
You are wise; the consultant is wrong.
Doing this checking on the client will not be faster. The client has to make a round trip to the database in order to do this unless the lookup table is cached—that is, unless the check always brings the entire table back to the client over the network. Now, if the client tries to cache this information, the client will succeed only in logically corrupting the data at some point when you do decide to update this table, or else you'll have to ask all your users to exit the program first! I suppose that is what they are thinking here: "cache it" and "cache is faster."
You must ask yourself: Is this the only application that will access this data forever? (This is a rhetorical question; history proves that the answer is no). So, if you hide all the rules of how the data relates in the application, what then? In a year, you have to dig it all out again and reimplement it over and over.
Did the consultant benchmark this to see the impact on performance? Let's do a quick check. I'll use the following simple schema:
drop table cities; create table cities as select username city from all_users where rownum <= 50; alter table cities add constraint cities_pk primary key(city); create table t1 ( x char(80), city references cities ); create table t2 ( x char(80), city varchar2(30) );
Now for the benchmark. I'll insert heavily into t1 with declarative referential integrity enabled and into t2 without it. To do that, I'll use the following code:
alter session set sql_trace=true; declare type array is table of varchar2(30) index by binary_integer; l_data array; begin select * BULK COLLECT into l_data from cities; for i in 1 .. 1000 loop for j in 1 .. l_data.count loop insert into t1 values ('x', l_data(j) ); end loop; end loop; end; /
Copy this PL/SQL block and change t1 to t2 for the second test. Now, using the TKPROF tool to format the SQL_TRACE output, you discover the following (some information edited out for space):
INSERT into t1 values ('x', :b1 ) call count cpu query current ----- ----- ---- ----- ------- Parse 1 0.00 2 0 Execute 37000 9.25 479 78969 Fetch 0 0.00 0 0 ----- ----- ---- ----- ------- total 37001 9.26 481 78969 INSERT into t2 values ('x', :b1 ) call count cpu query current ---- ----- ---- ----- ------- Parse 1 0.00 0 0 Execute 37000 7.87 484 41900 Fetch 0 0.00 0 0 ---- ----- ---- ----- ------- total 37001 7.88 484 41900
So, with referential integrity we get 0.00025 CPU seconds (9.25/37,000) per single row insert and 0.000212703 CPU seconds (7.87/37,000) per single row insert without it. Will your end users notice? Will you notice? The vast majority of the end user wait experience will be from the VB program painting screens. Do you think this consultant's lookup routine will execute in .000037297 seconds? If not, you are better off in the database.
Not only that, but suppose you decide to run an ad hoc query on this application data later. The tools will have the referential integrity to guide them. When you go to extend this application later, no new application can destroy the integrity, which it could otherwise do easily. And when you change the list of cities, you won't have to first shut down all clients (to clear their caches) and then restart them. You'll be able to sleep at night knowing your database is protected.
The benefits you get—data integrity, ease of maintenance, and self-documenting code, to name a few—far outweigh any perceived time issues here.
I need to know how to identify the data definition language (DDL) actions performed in the last 24 hours. If a developer overwrites some function, is there any way to get the old code from SYSTEM tables? Also, how can I identify the dependent objects and get the list of other objects dependent on the object in all schemas?
You want to enable auditing. It won't help you for the last 24 hours, but tomorrow you'll be set to go. Once someone overwrites the code, the old code is gone, wiped out, and just not there anymore. Short of your restoring the database and doing a point-in-time recovery to before the time someone dropped the procedure, that code is gone.
You could use a DDL trigger (before CREATE ) to capture the code and do the audit as well. This would work in so many cases that I'll expand on the idea and implement it here. I'll start with a table to hold the old code (and multiple versions of it) and record who overwrote it and when:
SQL> create table old_code 2 as 3 select user username, 0 version, sysdate date_changed, user_source.* 4 from user_source 5 where 1=0 6 / Table created. SQL> create sequence version_seq; Sequence created.
Now I can use this event trigger before CREATE on a schema to save any "old code" before it is overwritten:
SQL> create or replace trigger create_trigger 2 before create on schema 3 declare 4 l_date date := sysdate; 5 l_ver number; 6 begin 7 if (ora_dict_obj_type in ( 'PACKAGE', 'PACKAGE BODY', 'PROCEDURE', 'FUNCTION' ) ) 8 then 9 select version_seq.nextval into l_ver from dual; 10 11 insert into old_code 12 select user, l_ver, l_date, user_source.* 13 from user_source 14 where name = ora_dict_obj_name 15 and type = ora_dict_obj_type; 16 end if; 17 end; 18 / Trigger created.
To show this trigger in action, the following example walks through a case where I replace a function in the database:
SQL> create or replace function f return number 2 as 3 begin 4 return 0; 5 end; 6 / Function created. SQL> create or replace function f return date 2 as 3 begin 4 return sysdate; 5 end; 6 / Function created. SQL> select line, text from old_code order by line; LINE TEXT ---- ------------------------- 1 function f return number 2 as 3 begin 4 return 0; 5 end;
The original code is safe and secure. I kept track of who overwrote it and when (so I can yell at that person if need be), and I have the source in the event that I need to restore it.
As for getting the dependencies, the USER_DEPENDENCIES, ALL_DEPENDENCIES , and DBA_DEPENDENCIES data dictionary views contain all that for you.
We load between 4 and 15 flat files from two different legacy systems into our data warehouse nightly. As part of our regular daily audit of the data loads, we need to verify the record count of the files (sent via FTP). All files are fixed-length. Record lengths vary from 100 characters to more than 3,500 characters. All files are in the US7ASCII character set. The number of records for the "narrower" files ranges from less than 50 to 250,000; the number of records for the "wider" files ranges from 10,000 to 4 million. Overall file sizes range from several kilobytes to over 100MB. Executing "wc Ðl" on our largest files typically runs in about a minute. Getting the actual OS flat-file record count is the stumper. I've seen the examples for getting a directory listing into a database table via Java and examples of executing OS commands from PL/SQL via Java as well, but I'm not a Java programmer, and I need to combine these two example cases so that I can get a record count of all files in a given directory and have the results stored in a database table. Any suggestions?
Since they are fixed-length files, all you need to do is get the file size and then divide by the record size. That is easy. Using a BFILE, you can query a file's size, and you should already know the logical record length (LRECL) for the files so you can divide by it:
SQL> create or replace directory 2 MY_FILES as '/tmp/'; Directory created. SQL> create or replace 2 function number_of_records ( p_filename in varchar2, p_lrecl in number ) 3 return number 4 as 5 l_bfile bfile; 6 l_recs number; 7 begin 8 l_bfile := bfilename('MY_FILES', p_filename); 9 dbms_lob.fileopen( l_bfile ); 10 l_recs := dbms_lob.getlength(l_bfile) / p_lrecl; 11 dbms_lob.fileclose( l_bfile ); 12 return l_recs; 13 end; 14 / Function created. SQL> !wc -l /tmp/foo.dat 18 /tmp/foo.dat SQL> exec dbms_output.put_line (number_of_records( 'foo.dat', 20 )); 18 PL/SQL procedure successfully completed.
If for whatever reason you do not know the LRECL, you could go a step further:
SQL> create or replace 2 function number_of_records ( p_filename in varchar2 ) 3 return number 4 as 5 l_bfile bfile; 6 l_lrecl number; 7 l_recs number; 8 begin 9 l_bfile := bfilename( 'MY_FILES', p_filename ); 10 dbms_lob.fileopen( l_bfile ); 11 l_lrecl := dbms_lob.instr ( l_bfile, utl_raw.cast_to_raw(chr(10))); 12 l_recs := dbms_lob.getlength( l_bfile )/ l_lrecl; 13 dbms_lob.fileclose( l_bfile ); 14 return l_recs; 15 end; 16 / Function created. SQL> exec dbms_output.put_line (number_of_records('foo.dat')); 18 PL/SQL procedure successfully completed.
This function looks for the first newline character sequence ( chr(10) ) in the file and assumes that is the LRECL for the file.
Tom Kyte (firstname.lastname@example.org) has worked for Oracle since 1993. Kyte is a vice president in the Oracle Government, Education, and Health group; the author of Expert One-on-One: Oracle; and a coauthor of Beginning Oracle Programming (Wrox Press).Send us your comments