TECHNOLOGY: Ask Tom
On Ignoring, Locking, and ParsingBy Tom Kyte
Our technologist looks out for WHEN OTHERS, locks, cascades, and parses.
There's a programming construct, WHEN OTHERS, in PL/SQL that I wish weren't there. This clause, when used in an exception block, catches any unhandled exceptions. There are times when this is useful, such as for logging errors:
exception when others then log_error(....); raise; end;
But the problem I see is that far too many people use the WHEN OTHERS clause without following it by a RAISE or RAISE_APPLICATION_ERROR call. That effectively hides the error. In reality, the error happened, but it was not handled in any meaningful way; it was just ignored—silently. The invoker of your code has no idea that something went horribly wrong and that your code failed, and he or she typically thinks that everything was, in fact, successful.
On Ask Tom (asktom.oracle.com), I see questions and responses about this mistake made over and over again. For example, I recently received this question:
I created a package that spools 10 jobs with ranges of data to bulk-load into a flat file. I need to use UTL_FILE, because in the middle of the process, I call three procedures to retrieve some data. The total time of the process is six hours to bulk-load 9,500,000 strings into the file. I need your advice, because I need to decrease the total time.
This is the code:
PROCEDURE prcl_MakeFile(...) IS ... variables ... BEGIN l_FileID := UTL_FILE.FOPEN (...); OPEN cur; LOOP ... process records here ... ... lots of code ... END LOOP; CLOSE cur; UTL_FILE.FCLOSE(l_FileID); EXCEPTION WHEN OTHERS THEN IF (UTL_FILE.IS_OPEN(l_FileID)) THEN UTL_FILE.FCLOSE(l_FileID); END IF; END prcl_MakeFile;
My answer was simple: I could make this code go infinitely fast. All this procedure needs to do is
PROCEDURE prcl_MakeFile(...) IS ... variables... BEGIN Return; END prcl_MakeFile;
The two procedures are logically equivalent, but mine is much faster! So why are they logically equivalent? It is all because of the WHEN OTHERS clause coded in the procedure—the WHEN OTHERS clause that is not followed by a RAISE or RAISE_APPLICATION_ERROR. Suppose that the call to UTL_FILE .FOPEN failed in that routine—what would happen? All of that code would be skipped, but no one would know . If you have code wrapped with an exception block that contains WHEN OTHERS—but does not reraise the exception—all of the code in that exception block can be safely deleted, in my opinion. You do not need it, because if the code fails to execute, you ignore the fact that it did not execute . If the code is allowed to not execute sometimes, you can, in fact, allow that code to never execute . You cannot ever rely on this code's actually running, so you never have to execute it.
I also assert that it would be safer to not run the code than to run it. At least if you do not execute it, you know what state the database will be in. If you invoke a piece of code that includes the WHEN OTHERS construct (not followed by a reraising of the exception), the database will be left in some unknown state. For example, given the following procedure:
procedure p is begin insert into t1 values(1); insert into t2 values(2); insert into t3 values(3); exception when others then dbms_output.put_line ('something bad happened!'); end;
The invoker of this procedure would never know if
To read more on this important topic, see
Locking on the Web
I recently came across a .NET application running against Oracle Database 10g in which the developers had used optimistic locking (get a version ID from a table, update the required row, and then update the version ID) because it was the only way they could make sure that the same record was not updated by many users at the same time. I believe that this can be done with the FOR UPDATE clause rather than with unnecessary code that creates artificial locking. Am I right?
There are two ways n -tier applications access the database:
1. Stateful: holding a connection for a long period of time over many Web pages
2. Stateless: holding a connection for a very short period of time, maybe less than the time it takes just to generate a single HTML page
Most applications today use method 2, in my experience. In this fashion, end users consume resources only when they are actually "active" in the database. They grab a connection, use it, and release it.
If you have a stateless connection to the database, you cannot use pessimistic (FOR UPDATE) locking. You lose the lock you asked for with the FOR UPDATE clause after each page is generated. For that type of application, optimistic locking is the only way to go.
So the developers you are working with are probably doing the right thing. In my book Expert Oracle Database Architecture: 9 i and 10 g Programming Techniques and Solutions (Apress, 2005), I wrote about this extensively. The following is a small excerpt; the content has been edited for style and length.
Optimistic or Pessimistic Locking?
Which method is best? In my experience, pessimistic locking works very well in Oracle Database (but perhaps not in other databases) and has many advantages over optimistic locking. However, it requires a stateful connection to the database, such as a client/server connection, because locks are not held across connections. This single fact makes pessimistic locking unrealistic in many cases today. In the past, with client/server applications and a couple dozen or hundred users, it would have been my first and only choice. Today, however, optimistic concurrency control is what I recommend for most applications. Having a connection for the entire duration of a session is just too high a price to pay.
There are many ways to implement optimistic concurrency control, including
So which do I use? I tend to use the version-column approach with a time stamp column. It gives me extra information about when a particular row was last updated. It is less computationally expensive than a hash or checksum, and it doesn't run the risk of encountering a hash or checksum in the processing of LONG, LONG RAW, CLOB, BLOB, and other very large columns.
If I had to add optimistic concurrency controls to a table that was still being used with a pessimistic locking scheme, such as if the table was accessed in client/server applications as well as on the Web, I would opt for the ORA_ROWSCN approach. That's because the existing legacy application might not appreciate it if a new column appeared, and even if I took the additional step of hiding the extra column, I might not appreciate the overhead of the trigger necessary to maintain it. The ORA_ROWSCN technique would be nonintrusive and lightweight in that respect (well, after I got over the table recreation, that is).
The hashing/checksum approach is very database-independent, especially if we compute the hashes or checksums outside of the database. However, performing the computations in the middle tier rather than in the database means higher resource usage penalties in terms of CPU usage and network transfers.
I have an EMP table with child tables, and each of the child tables has its own child tables. I want to update employee code in EMP, and I want all child tables (including child-of-child tables) to update automatically. How can I do this?
Primary keys are supposed to be immutable—unchanging, constant. If you ask me, you have a data model problem, not a SQL problem. If you feel the need to update a primary key and have it cascade, you really need to rethink your approach. You should realize that the employee code in your example is not the primary key of the EMP table—not if it changes. You would need to choose something else (even an artificial key) to be the primary key.
However, you can use an UPDATE cascade if it is truly needed (for example, as a one-time fix for data you are merging). Use deferrable constraints and a stored procedure; for example, suppose you have the following schema:
create table p ( x int primary key ); create table c1 ( x constraint c1_fk_p r references p deferrable, y int, primary key(x,y) ); create table c2 ( x int, y int, z int, constraint c2_fk_c1 foreign key(x,y) references c1 deferrable, primary key(x,y,z));
Now you can set the constraints on C1 and C2 to be deferred—not checked until you either set them as IMMEDIATE or commit the transaction—whichever comes first. This allows you to write a stored procedure, as shown in Listing 1.
Code Listing 1: Cascading changes with deferred constraints
create or replace procedure cascade_p_c1_c2 ( p_old in int, p_new in int ) as begin execute immediate 'set constraint c1_fk_p deferred'; execute immediate 'set constraint c2_fk_c1 deferred'; update p set x = p_new where x = p_old; update c1 set x = p_new where x = p_old; update c2 set x = p_new where x = p_old; execute immediate 'set constraint c1_fk_p immediate'; execute immediate 'set constraint c2_fk_c1 immediate'; end;
And now you can call this procedure, and it will be able to successfully cascade the update from the parent table P to the child table C1 and its child table C2. But again, you would do this only on the rarest of rare occasions—this is not something that should become part of your permanent design and implementation strategy.
A Parse Is a Parse Is a Parse
I have a database with "latch : library cache" problems. I'm trying to identify the source of these problems with STATSPACK.
Per Second Per Transaction ----------- --------------- ... User calls: 1,107.76 53.24 Parses: 389.92 18.74 Hard parses: 0.28 0.01 ...
Do you have any suggestions?
This is a system that parses a lot of SQL—your system is parsing about 390 times per second. The good news is that the parses are mostly soft .
The only people who can reduce parses are the developers themselves. Oracle Database parses a SQL statement every time the application tells it to, and here the application is telling the database to do it a lot.
In the 12.98 minutes of your STATSPACK information, you did about 303,669 parses. A parse requires the library cache latch.
The ultimate solution: realize that the only good parse is a nonexistent parse and have the code that executes against the database adhere to this philosophy and have it not parse so much. Keep cursors open. Do not close them until you do not need them. You can promote this approach easily by placing all SQL into stored procedures (where PL/SQL will automatically cache them—not close them—so even if you say "PL/SQL close this cursor," the database will ignore you and cache it).
If you use Java, use Java DataBase Connectivity (JDBC) statement caching, so that JDBC will ignore your developers when they say "close this cursor."
The only way to reduce latching is to reduce the number of times you do the thing that requires latching. And parsing is a huge user of latches.
One thing to look into: if you have not set the session_cached_cursors parameter, you might consider using that setting. It can make your soft parses "softer." This setting will help if your application performs the following logic over and over:
Consider this example, which I monitored by using a small test harness I use frequently called runstats. I'll start with a small routine that just parses and parses over and over again, using dynamic SQL (the parses will be mostly soft parses), as shown in Listing 2.
Code Listing 2: Parsing over and over
create or replace procedure p( p_n in number ) as l_cursor sys_refcursor; begin for i in 1 .. p_n loop open l_cursor for 'select * from dual d' || mod(i,2); close l_cursor; end loop; end;
Now, if I compare the latching differences when I run that procedure with and without session cursor caching, I'll see a large difference, as shown in Listing 3.
Code Listing 3: Latches with session_cached_cursors=0
SQL> alter session set session_cached_cursors=0; Session altered. SQL > exec runStats_pkg.rs_start; PL/SQL procedure successfully completed. SQL > exec p(1); PL/SQL procedure successfully completed. SQL > exec runStats_pkg.rs_middle; PL/SQL procedure successfully completed. SQL > exec p(100000); PL/SQL procedure successfully completed. SQL > exec runStats_pkg.rs_stop(90000); Name Run1 Run2 Diff STAT...parse count (total) 6 100,005 99,999 LATCH.shared pool simulator 8 100,012 100,004 LATCH.shared pool 10 100,053 100,043 LATCH.library cache lock 36 400,044 400,008 LATCH.library cache 67 400,093 400,026 Run1 latches total versus runs -- difference and pct Run1 Run2 Diff Pct 361 11,001,012 1,000,651 .04% PL/SQL procedure successfully completed.
So, Listing 3 shows that 100,000 soft parses needed about 11,000,000 latches—most on the library cache—when I parsed over and over and over. Now I set the session_cached_cursors parameter as follows:
SQL> alter session set session_cached_cursors=100; Session altered.
I run the same example and observe numbers similar to those in Listing 4.
As you can see in Listing 4, there is a huge reduction in latching, but the parsing—the raw number of times I parse—is the same, because the session cursor cache came into play and made the soft parses softer .
Code Listing 4: Latches with session_cached_cursors=100
Name Run1 Run2 Diff STAT...parse count (total) 6 100,005 99,999 STAT...execute count 6 100,005 99,999 STAT...session cursor cache hi 2 100,001 99,999 STAT...calls to get snapshots 2 100,001 99,999 STAT...opened cursors cumulati 6 100,005 99,999 STAT...recursive calls 5 300,002 299,997 Run1 latches total versus runs -- difference and pct Run1 Run2 Diff Pct 304 845 541 35.98%
Stringing Them Up
I need to take the results of a query and pivot a value. That is, I would like the output from the EMP table to look like this:
DEPTNO ENAME ------------ -------------------- 10 clark king miller 20 adams ford ... ...
Can this be done in just SQL?
With the addition of analytic functions in Oracle8i Release 2 and the SYS_CONNECT_BY_PATH() function in Oracle9i Database Release 1, this became something you can do rather easily in SQL. Take the following approach:
1. Partition the data by DEPTNO and, for each DEPTNO, sort the data by ENAME, and assign a sequential number by using the ROW_NUMBER() analytic function.
2. Use a CONNECT BY query, starting with ROW_NUMBER() equal to 1 and connecting that record to the same DEPTNO value with ROW_NUMBER() equal to 2, and so on. So, you eventually end up with a record that is the result of connecting 1 to 2 to 3 to 4, and so on, for each DEPTNO value.
3. Select just the "longest connect by path" for each DEPTNO value—the longest connect by path for each DEPTNO value will have all of the ENAME values gathered together.
The SYS_CONNECT_BY_PATH() function will return the list of concatenated ENAME values.
The query looks like this:
select deptno, max(sys_connect_by_path (ename, ' ' )) scbp from (select deptno, ename, row_number() over (partition by deptno order by ename) rn from emp ) start with rn = 1 connect by prior rn = rn-1 and prior deptno = deptno group by deptno order by deptno / DEPTNO SCBP --------- ---------------------------------- 10 CLARK KING MILLER 20 ADAMS FORD JONES SCOTT ... 30 ALLEN BLAKE JAMES MARTIN ...
Cutting Down on Redo
I have a PL/SQL package that copies data across multiple databases by using a database link to our data warehouse. It uses BULK_COLLECT with a limit of 1,000 to 2,500, depending on the column count within each table. I am committing outside the loop, so there is only one commit.
The DBAs are stating that the amount of REDO is incredible, and they have had to increase the disk space available for Oracle Database to catch up. They haven't specifically stated that the issue is from my code, but the issue appeared around the time of my code implementation.
How can I monitor or ensure that the logging is optimized when writing code?
By doing things in single SQL statements as much as possible. The biggest impact you can have on REDO generation is to limit the amount of work done per call (per execution).
Also, look for opportunities to do direct-path, nonlogged operations. (But coordinate with your DBAs! They need to do a backup right after the operation.)
Look at the difference in REDO generated between a row-by-row (slow-by-slow) approach and a single SQL statement, shown in Listing 5.
Code Listing 5: Minimizing redo
SQL> create table t ( x int primary key, y char(10), z date ); Table created. SQL > create index t_idx1 on t(y); Index created. SQL > create index t_idx2 on t(z); Index created. SQL > @mystat "redo size" NAME VALUE ------------- ------------- redo size 84538288 SQL > begin 2 for x in (select object_id a, 'x' b, created c from all_objects) 3 loop 4 insert into t values ( x.a, x.b, x.c ); 5 end loop; 6 end; 7 / PL/SQL procedure successfully completed. SQL > @mystat2 NAME VALUE DIFF ------------- ------------- ---------- redo size 144124840 59,586,552 SQL > truncate table t; Table truncated. SQL > @mystat "redo size" SQL > set echo off NAME VALUE ------------- ------------ redo size 144294508 SQL > begin 2 insert into t select object_id, 'x', created from all_objects; 3 end; 4 / PL/SQL procedure successfully completed. SQL > @mystat2 NAME VALUE DIFF ------------- ----------- ----------- redo size 168114280 23,819,772
That is 59MB of REDO with a row-by-row insert versus 23MB with a single, efficient SQL statement!
Tom Kyte has worked for Oracle since 1993. He is a vice president in the Oracle Public Sector group and the author of Expert Oracle Database Architecture: 9 i and 10 g Programming Techniques and Solutions (Apress, 2005) and Effective Oracle by Design (Oracle Press, 2003), among others.