TECHNOLOGY: Ask Tom

As Published In
Oracle Magazine
July/August 2009

 

On Popularity and Natural Selection

By Tom Kyte Oracle Employee ACE

Our technologist optimizes away binds and promotes more-selective queries.

It is time to do a “flashback query” of sorts and revisit a column, “ On Rescue Analytics and Popularity,” from almost three years ago ( Oracle Magazine , November/December 2006, otn.oracle.com/oramag/oracle/06-nov/o66asktom.html). At that time, one answer from this column was the most-read answer on AskTom (asktom.oracle.com), and it still holds that record to this day. Here’s the original question:

I want to declare multiple cursors based on the values passed through a procedure, and only the WHERE conditions of the cursors will change. The body of the procedure is the same for all the cursors otherwise.

When I first answered this question (on AskTom in 2000!), I recommended using an application context to “bind” inputs. The technique was as follows:

 

if ( p_ename is NOT NULL ) 
then
   dbms_session.set_context
  ( 'MY_CTX', 'ENAME',
    '%'||upper(p_ename)||'%');
  l_query := l_query ||
  ' and ename like
    sys_context
    ( ''MY_CTX'', 
      ''ENAME'' ) ';
end if;


In this case, the code was checking to see if the P_ENAME parameter had a value. If it did, the code would bind

 

'%'||upper(p_ename)||'%'


into the query, using an application context. This was a two-step process: first put the value into the application context and then reference it in the WHERE clause of the query.

The goal of the procedure was to construct a query based on the inputs and return a ref cursor to the client, and the procedure permitted you to have an unknown number of bind variables and use native dynamic SQL.

To use native dynamic SQL, however, you need to know at compile time how many bind variables will be used. And because we were building the query on the fly, based on how many things we wanted to bind, that didn’t seem possible or efficient. Hence, the application context approach.

Well, a friend of mine—Bryn Llewellyn, the PL/SQL product manager at Oracle—had a look at the implementation as presented and suggested that it was overly complex. Fortunately, he not only suggested that but also reminded me about an alternative, and superior, implementation that’s been used in the field for some years. It would not only simplify the code but would also have a positive impact on performance. By using this new approach—outlined below—we’ll avoid a few function calls and the use of application contexts altogether. That last point is important in some environments, because to create a context, you would need the CREATE ANY CONTEXT privilege, and—for good reason—that might be a privilege you are not accorded.

The superior implementation suggestion was to always code a reference to every possible bind variable but to do so in such a way that the optimizer would remove the bind variable for us if we weren’t going to actually use it in that particular execution of the SQL statement. So now, because we have a fixed number of binds (equal to the number of parameters for the procedure that builds the SQL), we can use native dynamic SQL and get the best-possible plan.

I’m not going to present the old application context code in its entirety—you can see that on the Oracle Magazine Web site by following the above link. What I will present here is the new code and a comparison of the queries generated by the old and the new code.

First, the suggested code modification was to build a predicate for each possible bind input that would look like either

1. WHERE column = :bind_variable or

2. WHERE (1=1 or :bind_variable is null)

We would use #1 when the bind variable was to be used to constrain the result set and #2 when the bind variable wasn’t to be supplied. The optimizer would see 1=1 or < something else > and automatically know that it was true, so it would just optimize this predicate away.

The goal for the new code is the same as for the old code, so we’ll take the old approach and recode it, using the new approach. We want to code a procedure that will accept as many as three inputs (ENAME, HIREDATE, and SAL) and dynamically construct a query against the EMP table, using those inputs. If we pass in ENAME, we want the ENAME column in every record to be like the uppercase value of the input. If we pass in HIREDATE, we want everyone hired after that date. Likewise, if we pass in SAL, we want everyone that makes more than the supplied value. If we send in two or three arguments to this procedure, we would like the filters we build to be connected (by AND).

The specification of MY_NEW_PROCEDURE is shown in Listing 1. In MY_NEW_PROCEDURE, we accept our three inputs and set up some local variables. L_CURSOR is the ref cursor we would return to the client application, but because this is a quick demonstration, we will not be returning this value. Instead, we will process the value in the procedure itself to see the results and use L_REC to fetch into. The L_QUERY variable is, as its name implies, a string to hold the SQL statement we generate.

Code Listing 1: Specification of my_new_procedure

 

SQL> create or replace
  2    procedure my_new_procedure
  3    ( p_ename    in varchar2 default NULL,
  4      p_hiredate  in date default NULL,
  5      p_sal         in number default NULL)
  6    as
  7       l_cursor sys_refcursor;
  8       l_query  varchar2(512)
  9       default 'select * from emp';
 10
 11      cursor l_template is select * from emp;
 12      l_rec  l_template%rowtype;
 13 Begin


The code for building the query is very simple. I presented the code for “binding” P_ENAME, using an application context; here is the new approach:

 

 14    if ( p_ename is NOT NULL )
 15    then
 16      l_query := l_query ||
 17      ' where ename like
 18          ''%''||:p_ename||''%'' ';
 19    else
 20      l_query := l_query ||
 21      ' where (1=1 or
 22          :p_ename is null) ';
 23    end if;


Note that we always bind :P_ENAME into the query now. This is key—it enables us to use native dynamic SQL, because we'll have a static set of bind variables known at compile time.

The rest of the code for binding the other two columns is equally simple:

 

 24    if ( p_hiredate is NOT NULL )
 25    then
 26      l_query := l_query ||
 27      ' and hiredate > :p_hiredate ';
 28    else
 29      l_query := l_query ||
 30      ' and (1=1 or
 31          :p_hiredate is null) ';
 32    end if;
 33
 34    if ( p_sal is NOT NULL )
 35    then
 36      l_query := l_query ||
 37      ' and sal > :p_sal ';
 38    else
 39      l_query := l_query ||
 40      ' and (1=1 or
 41          :p_sal is null) ';
 42    end if;


The main body of the demonstration procedure is then as follows:

 

 43    dbms_output.put_line
 44    ( l_query );
 45
 46    open l_cursor
 47     for l_query
 48    using p_ename,
 49         p_hiredate,
 50         p_sal;
 51
 52    loop
 53      fetch l_cursor
 54       into l_rec;
 55      exit when
 56       l_cursor%notfound;
 57
 58      dbms_output.put_line
 59      ( l_rec.ename || ' , ' ||
 60        l_rec.hiredate || ' , ' ||
 61        l_rec.sal );
 62    end loop;
 63    close l_cursor;


Finally, we simply open the cursor, and in “real life” that would be the end of it (we would return that to the client and then fetch and print out each row). To show what the optimizer is doing with our query, I’ve added a little diagnostic output at the end:

 

 64    for x in
 65    (select plan_table_output
 66       from table
 67       (dbms_xplan.display_cursor)
 68    )
 69    loop
 70       dbms_output.put_line
 71       ( x.PLAN_TABLE_OUTPUT );
 72    end loop;
 73
 74   end;
 75   /
Procedure created.


Right after we run the query, we can get the actual plan used, by invoking DBMS_XPLAN.DISPLAY_CURSOR (in Oracle Database 10g and above only). We’ll do that to verify that the optimizer removes the predicate when it is not relevant, confirming that this is a very efficient approach.

Now suppose we run MY_NEW_PROCEDURE with no inputs whatsoever. Using the old code, the resulting query is

 

select * 
  from emp 
 where 1 = 1


Using the new approach, in contrast, our query is

 

select * 
   from emp 
 where (1=1 or   
     :p_ename is null)  
 and (1=1 or
     :p_hiredate is null)  
 and (1=1 or
     :p_sal is null)


At first you might look at the queries and say, “The new way looks like a much more complex query for the database.” However, see what the optimizer did with the new query in Listing 2.

Code Listing 2: New query and plan (with no predicate information)

 

select * 
   from emp 
 where (1=1 or   
     :p_ename is null)  
  and (1=1 or
     :p_hiredate is null)  
  and (1=1 or
     :p_sal is null)

-------------------------------------------------------------------------
| Id  | Operation         | Name | Rows | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      |       |     3 (100)|          |
|   1 |  TABLE ACCESS FULL| EMP  |    14|  1218 |    3    (0)| 00:00:01 |
-------------------------------------------------------------------------
Note
-------------
dynamic sampling used for this statement


We can see that there is no Predicate Information section, which is because, from the optimizer’s standpoint, our new query was simply SELECT * FROM EMP . The entire WHERE clause was optimized away.

Now suppose we want to apply a predicate against the HIREDATE column. The old approach would have generated this query:

 

select * 
   from emp 
 where 1 = 1  
    and hiredate >
    to_date(
   sys_context( 'MY_CTX',
    'HIREDATE' ),
    'yyyymmddhh24miss')


We had to cast the application context to a date (all context values are strings—no other type is available). When we loaded the context, we had to pick a stable date format to encode the date into a string, and when we referenced it in the query, we had to decode it back out. The new query lacks that complexity—it doesn’t have those function calls:

 

select * 
   from emp 
 where (1=1 or     
     :p_ename is null)  
   
                              
and hiredate >

:p_hiredate
and (1=1 or :p_sal is null)


Because we are binding a date, no conversions are necessary. The optimizer—given this query—comes up with the plan in Listing 3.

Code Listing 3: A query and minimal predicate information

 

select * 
   from emp 
 where (1=1 or     
     :p_ename is null)  
   and hiredate >
     :p_hiredate  
   and (1=1 or     
     :p_sal is null)

-------------------------------------------------------------------------
| Id  | Operation         | Name | Rows | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      |       |     3 (100)|          |
|*  1 |  TABLE ACCESS FULL| EMP  |    1 |    87 |    3    (0)| 00:00:01 |
-------------------------------------------------------------------------
Predicate Information (identified by operation id):
-------------------------------------------------------------------------
1 - filter("HIREDATE">:P_HIREDATE)
Note
-------------
dynamic sampling used for this statement


Note that the Predicate Information in the plan is reduced to just the relevant bit for us (“HIREDATE”>:P_HIREDATE). We can bind all three columns but have the optimizer throw away the bits that are not necessary.

It is always nice to be exposed to or reminded of a superior technique, and I now definitely prefer this approach over the application context. I have updated “the most popular answer ever” with a link to this column.

Reason #13134213 Not to Use SELECT *

There are many reasons to avoid using SELECT * in your code, and performance is a major one. SELECT * will typically force a query to access a table, whereas SELECT < just the columns you need > may well be able to avoid the table altogether (because it might be able to get the answer from the index). Code maintenance is another reason to avoid SELECT *. If someone adds a column to a table—and your code is not expecting that—your code may well break.

Oracle Database 11g adds to a very long list yet another reason to avoid SELECT *. In that release, the database dependency mechanism was extended down to the column level, so if your procedures are constructed with explicit references to just the columns you need, you will find your code being invalidated—and therefore needing to be recompiled—much less often. Consider the small example in Listing 4.

Code Listing 4: bad_proc, good_proc

 

SQL> create table t ( x int, y int );
Table created.

SQL> create or replace
  2    procedure bad_proc
  3    as
  4      cursor c
  5      is
  6      select *
  7         from t;
  8    begin
  9       null;
 10    end;
 11    /
Procedure created.

SQL> create or replace
  2    procedure good_proc
  3    as
  4      cursor c
  5      is
  6      select x, y
  7        from t;
  8    begin
  9       null;
 10    end;
 11    /
Procedure created.

SQL> select object_name, status
  2       from user_objects
  3    where object_name in
  4    ('BAD_PROC', 'GOOD_PROC')
  5    order by object_name;

OBJECT_NAME      STATUS
------------     --------
BAD_PROC         VALID
GOOD_PROC        VALID


Right now both the BAD_PROC and GOOD_PROC procedures are equivalent: they do the same thing, and they are both valid. However, BAD_PROC is dependent on the entire table T structure whereas GOOD_PROC is dependent only on X and Y in table T.

If we modify table T:

 

SQL> alter table t add z number;
Table altered.

SQL> select object_name, status
  2      from user_objects
  3    where object_name in
  4    ('BAD_PROC', 'GOOD_PROC')
  5    order by object_name;

OBJECT_NAME         STATUS
------------        --------
BAD_PROC            INVALID
GOOD_PROC           VALID


you will be able to see that the STATUS of BAD_PROC—which included SELECT *—is now INVALID, and it needs to be recompiled. GOOD_PROC, on the other hand, remains VALID. You’ll find that application upgrades and maintenance tasks greatly benefit from this new column dependency capability, provided that the developers wrote the code correctly in the first place!

Another new dependency feature in Oracle Database 11g is that it’s possible to recompile procedures without invalidating dependent code, as long as the signature of the procedure does not change. That is, if you do not modify the list of inputs and outputs of your procedures, the dependent code will not need to be recompiled. This enables you to patch your system with a significantly smaller maintenance window. Consider the procedures in Listing 5.

Code Listing 5: Creating P1 and P2

 

SQL> create or replace
  2    procedure p1
  3    as
  4    begin
  5      dbms_output.put_line
  6      ( 'hello' );
  7    end;
  8    /

Procedure created.

SQL> create or replace
  2    procedure p2
  3    as
  4    begin
  5      p1;
  6    end;
  7    /

Procedure created.

SQL> select object_name, status,
  2         to_char( last_ddl_time,
  3                     'hh24:mi:ss') ddl_time
  4    from user_objects
  5   where object_name in
  6   ('P1', 'P2')
  7   order by object_name;

OBJECT_NAME     STATUS    DDL_TIME
------------    --------- ---------
P1              VALID     08:24:40
P2              VALID     08:24:40


So, now we have the P1 and P2 procedures, P2 is dependent on P1, and both were compiled at about the same time. If we recompile P1 after modifying the code (fixing a bug, perhaps), we’ll find the result in Listing 6.

Code Listing 6: Modifying P1, but P1 and P2 procedures are still valid

 

SQL> create or replace
  2    procedure p1
  3    as
  4    begin
  5      dbms_output.put_line
  6      ( 'hello WORLD' );
  7    end;
  8    /

Procedure created.

SQL> select object_name, status,
  2         to_char( last_ddl_time,
  3                     'hh24:mi:ss') ddl_time
  4    from user_objects
  5    where object_name in
  6    ('P1', 'P2')
  7    order by object_name;

OBJECT_NAME     STATUS    DDL_TIME
------------    --------- ---------
P1              VALID     08:24:41
P2              VALID     08:24:40


Now, in Oracle Database 10g and before, P2 would be invalid, but because we were using Oracle Database 11g and we did not change the signature of the procedure, we did not invalidate the dependent code.

Oracle Database 10g added another optimization for application upgrades: the compiler checks whether newly supplied application code differs from the existing code. If the new code you “CREATE OR REPLACE” is the same as the old code—and the PL/SQL compilation environment settings are the same as they were when we first compiled the code—we will skip compilation of the new code. This feature was added because in many cases, applications upgrade themselves by re-creating all of their code, even if only 10 percent of it has changed. This new capability enables us to possibly skip compiling 90 percent of the code. We can easily see this in the following:



 

SQL> exec dbms_lock.sleep(2)
PL/SQL procedure successfully completed.

SQL> create or replace
  2    procedure p1
  3    as
  4    begin
  5       dbms_output.put_line
  6       ( 'hello WORLD' );
  7    end;
  8    /
Procedure created.

SQL> select object_name, status,
  2         to_char( last_ddl_time,
  3                     'hh24:mi:ss') ddl_time
  4        from user_objects
  5     where object_name in
  6     ('P1', 'P2')
  7     order by object_name;

OBJECT_NAME     STATUS    DDL_TIME
------------    --------- ---------
P1              VALID     08:24:41
P2              VALID     08:24:40


Next Steps


 ASK Tom
Tom Kyte answers your most difficult technology questions. Highlights from that forum appear in this column.
 

 READ more Tom
Oracle Database Concepts 11g Release 2 (11.2)
Expert Oracle Database Architecture: Oracle Database Programming 9I, 10g, and 11g Techniques and Solutions, Second Edition


 DOWNLOAD Oracle Database 11g Release 2

Note how the time stamp (DDL_TIME) for the P1 procedure did not change . We waited at least two seconds, and we know that the time must have been 8:24:43 or later when we last compiled this code, but because we did not modify it, we did not compile it. This new feature has caught some people when they upgraded their applications.

Now suppose that you had a process for upgrading your application that performed these steps:

1. Check all code out of source code control
2. Run CREATE OR REPLACE for all code
3. Query USER_OBJECTS to verify that LAST_DDL_TIME was updated

That last step is what has caught people. Most of their code hadn’t actually changed between upgrades, and hence most of it didn’t have to be compiled. LAST_DDL_TIME wasn’t modified, so their process thought something had gone wrong with the upgrade. But it is easy to modify the process and reap the benefits of this improved compilation.

Just for completeness, here’s a demonstration of what happens in Oracle Database 11g if you change the signature of your stored procedure in any way:

 

SQL> create or replace
  2    procedure p1
  3    ( x in number default NULL )
  4    as
  5    begin
  6       dbms_output.put_line
  7       ( 'hello WORLD' );
  8    end;
  9    /
Procedure created.

SQL> select object_name, status,
  2         to_char( last_ddl_time,
  3                     'hh24:mi:ss') ddl_time
  4    from user_objects
  5   where object_name in
  6   ('P1', 'P2')
  7   order by object_name;

OBJECT_NAME     STATUS    DDL_TIME
------------    --------- ---------
P1              VALID      08:24:45
P2              INVALID    08:24:40


Note that P2 is now invalid, due to the change in P1’s specification. Even though that change did not really affect P2, P1’s signature was modified, so the dependent code was invalidated and will be recompiled.

 


Tom Kyte Headshot


Tom Kyte is a database evangelist in Oracle’s Server Technologies division and has worked for Oracle since 1993. He is the author of Expert Oracle Database Architecture (Apress, 2005, 2010) and Effective Oracle by Design (Oracle Press, 2003), among other books.

 


Send us your comments