As Published In
Oracle Magazine
November/December 2006


On Rescue Analytics and Popularity

By Tom Kyte Oracle Employee ACE

Our technologist explains the saving power of analytics and shares popularity.

ROW_NUM is the primary key of this table, and I need to update COLA and COLB to fill up the nulls with the previous non-null values. After the update, SELECT * should give the following results: 

SQL> select * from t1;

  ROW_NUM   COLA          COLB
---------   ----------    --------                                              
        1   Category 1    Mango 
        2   Category 1    Mango
        3   Category 1    Mango
        4   Category 1    Banana
        5   Category 1    Banana
        6   Category 1    Banana
        7   Category 2    Vanilla
        8   Category 2    Vanilla
        9   Category 2    Strawberry
9 rows selected.

Suppose I have the following table: 

SQL> select * from t1;

  ROW_NUM   COLA          COLB
---------   ----------    --------
        1   Category 1    Mango
        4                 Banana
        7    Category 2   Vanilla
        9                 Strawberry
9 rows selected.

Is this possible with one SQL statement, or do I need to write a stored procedure?

Using analytic functions, you can materialize this data easily. I'll show both approaches to carrying down the last non-null value. This is a problem people have to solve many times in a data warehouse when performing time series analysis against sparse data.

In this case, the Oracle Database 10g Release 2 query would look like this: 

SQL> select
  2    row_num,
  3    last_value(cola ignore nulls)
  4       over (order by row_num) cola,
  5    last_value(colb ignore nulls)
  6       over (order by row_num) colb
  7    from t1
  8   order by row_num
  9  /

  ROW_NUM   COLA          COLB
---------   ----------    --------
        1   Category 1    Mango
        2   Category 1    Mango
        3   Category 1    Mango
        4   Category 1    Banana
        5   Category 1    Banana
        6   Category 1    Banana
        7   Category 2    Vanilla
        8   Category 2    Vanilla
        9   Category 2    Strawberry
9 rows selected.

That release added the IGNORE NULLS clause for certain analytic functions, such as LAST_VALUE (which retrieves the last value of a given column in an analytic window). With IGNORE NULLS, you are able to retrieve the last non-null value of any given column in an analytic window, effectively allowing you to carry the value down and populate that column. Now, in earlier releases you did not have this capability and had to be a bit more creative in your approach; an equivalent query (assuming ROW_NUM is a 10-digit positive number) in an earlier release would have been 

     when cola is not null
    ) over (order by row_num),
    11 ) cola,
     when colb is not null
    ) over (order by row_num),
    11 ) colb
  from t1
 order by row_num

If you are interested in understanding how that works, you can review, "On Format, Negation, and Sliding," an earlier Ask Tom column in which I used a similar carry-down technique to group data.

Once I have that query, updating the original source data becomes as easy as this merge: 

SQL> merge into t1
  2    using (
  3    select
  4      row_num,
  5      last_value(cola ignore nulls)
  6        over (order by row_num) cola,
  7      last_value(colb ignore nulls)
  8       over (order by row_num) colb
  9       from t1
 10    ) t2
 11   on (t1.row_num = t2.row_num)
 12    when matched
 13    then update
 14      set cola = t2.cola,
 15           colb = t2.colb;
9 rows merged.

That again uses an Oracle Database 10g feature: a merge with just the UPDATE component. In earlier releases, I would have had to supply not only a WHEN MATCHED clause but also a WHEN NOT MATCHED clause. In this case, because the USING set of data is based entirely on the data I am merging into, I know that the WHEN NOT MATCHED clause will never happen (because there cannot be any ROW_NUM in T2 that is not in T1). So I can just use a dummy WHEN MATCHED that tries to insert a NULL into ROW_NUM: 

merge into t1
second query from above
) t2
on (t1.row_num = t2.row_num)
when matched
then update
  set cola = t2.cola,
       colb = t2.colb
when not matched
then insert (row_num)
        values (NULL);

And that does it.

The Most Popular Answer Ever

It was first posted more than five years ago, and it is the most-read question and answer on Ask Tom (—with almost a quarter of a million views as of this writing. Here it is:

I want to declare multiple cursors based on the values passed through a procedure, and only the WHERE conditions of the cursors will change. The body of the procedure is the same for all the cursors otherwise.

This sounds like a good use of ref cursors to me. Suppose you wanted to build a generic routine that would look at the inputs passed to it and build a WHERE clause for each NON-NULL parameter passed. This would result in a large number of statically defined cursors, so you would use a ref cursor instead, allowing you to do this dynamically.

I'll demonstrate below. I'll write a routine that will print out some EMP data. This routine will take up to three inputs to constrain the result set. I want to have up to eight different cursors possible here: 

  • One with no WHERE clause (all inputs null)

  • Three with a single predicate

  • Three with "pairs" of predicate conditions

  • One with all three predicate conditions

Additionally, because the use of bind variables is one of the most important things in Oracle programming, I'll want to make sure I use them as well. This will be tricky, because I don't know if I'll have zero, one, two, or three of them until runtime. I'll use an application context to solve that problem.

Here is a sample implementation: 

SQL> create or replace 
  2    context MY_CTX 
  3    using MY_PROCEDURE
  4    /
Context created.

That created my application context and bound it to my yet-to-be-created procedure MY_PROCEDURE. Note that only MY_PROCEDURE will be able to set values in this context. Now for convenience I'll wrap DBMS_OUTPUT.PUT_LINE in a small routine. This is to handle strings larger than 255 characters (not necessary in Oracle Database 10g Release 2, in which the line size limit is increased to 32K). 

SQL> create or replace 
        procedure p ( p_str in varchar2 )
  2    is
  3      l_str    long := p_str||chr(10);
  4      l_piece long;
  5      n         number;
  6  begin
  7    loop
  8      exit when l_str is null;
  9    n := instr( l_str, chr(10) );
 10   l_piece := 
          substr( l_str, 1, n-1 );
 11   l_str   := 
          substr( l_str, n+1 );
 12   loop
 13     exit when l_piece is null;
 14     dbms_output.put_line
         ( substr( l_piece, 1, 250));
 15     l_piece := 
           substr( l_piece, 251 );
 16   end loop;
 17  end loop;
 18 end;
 19 /
Procedure created.

I use this P routine later to dump the dynamically generated query so I can see what was built for each execution. It is not really relevant to the example, just part of the demonstration.

Now for the "meat" of the answer. Here's the routine that dynamically constructs a predicate for us: 

create or replace
procedure my_procedure
( p_ename     in varchar2 default NULL,
  p_hiredate  in date default NULL,
  p_sal          in number default NULL)
  type rc is REF CURSOR;
  l_cursor rc;
  l_query  varchar2(512)
                  default 'select * 
                            from emp 
                            where 1 = 1 ';
  cursor l_template is select * from emp;
  l_rec  l_template%rowtype;

Here I use what I call "template" cursors. I like to use these with dynamically opened ref cursors. I use them to define a record to fetch into. Here, in this simple example, I could have skipped it and just defined l_rec as EMP%rowtype, but I wanted to show how this would work if I didn't issue SELECT * on a single table but on many columns from many tables. This just helps me create a nice record type for PL/SQL. The template query has only a SELECT and a FROM. I never put a WHERE clause on it (even when joining), because I never actually open it. I just use it to get the default datatypes, names, and so on for a record definition right below it. Also, note the where 1 = 1 trick. That is so I can just append zero, one, or more predicates to this query without having to figure out if I need to append the WHERE condition or the AND condition. And because I started the WHERE clause, I just add AND conditions. Note that if you are joining multiple tables, you'll already have a predicate (using the old-style join conditions) and won't need the where 1 = 1 trick. 

   if ( p_ename is NOT NULL ) 
    ( 'MY_CTX', 'ENAME',
    l_query := l_query ||
     ' and ename like
       sys_context( ''MY_CTX'', 
                         ''ENAME'' ) ';
   end if;

The technique I'm using here is that for each input, I inspect it to see if it is non-null. If it is, I add it to the WHERE clause and set the value in the context. Note how in the WHERE clause, I always use the SYS_CONTEXT function. I never put the literal value into the query—that would be very bad for 

  • Performance

  • Scalability

  • Shared pool utilization

  • Perhaps most important: security (SQL injection)

To read about SQL injection, you can refer to, an earlier Ask Tom column on that subject.

Also, note how I had to double the quote marks to get a single quote mark in the character string literal. In Oracle Database 10g Release 1 and later, I could use the new quoting method for string literals introduced in that release: 

 l_query := l_query ||
  q'| and ename like
      sys_context( 'MY_CTX', 
                        'ENAME' ) |';

To provide a response that works in all current releases of Oracle Database, I'll use Oracle9i Database and the earlier approach of using double quote marks in the remaining text. Now, continuing on, I process the HIREDATE column: 

if ( p_hiredate is NOT NULL ) 
   ( 'MY_CTX', 'HIREDATE',
   l_query := l_query ||
    ' and hiredate >
       sys_context( ''MY_CTX'',
                         ''HIREDATE'' ),
             ''yyyymmddhh24miss'') ';
end if;

Note here how I am careful to preserve the date and time component (you are the only one who knows if this is necessary for your application). Also, always wrap the SYS_CONTEXT call in a TO_DATE call when you are comparing with a DATE, to avoid implicit conversions in the query at runtime. Last, I process the third column, SAL: 

if ( p_sal is NOT NULL ) 
   ( 'MY_CTX', 'SAL', p_sal);
   l_query := l_query ||
     ' and sal >
          sys_context( ''MY_CTX'',
                            ''SAL'' )
        ) ';
end if;

Note the explicit conversion for the NUMBER here. Use TO_NUMBER to avoid implicit conversions. Now I am ready to debug the query, using my P routine, and open the ref cursor: 

  p( l_query );

  open l_cursor for l_query;

   fetch l_cursor into l_rec;
   exit when l_cursor%notfound;
   ( l_rec.ename || ',' ||
     l_rec.hiredate || ',' ||
     l_rec.sal );
  end loop;

  close l_cursor;

And that's it. I now have a routine that will open one of eight possible different cursors. Here is a small test run just to see how it works: 

SQL> exec my_procedure
select * from emp where 1 = 1
SMITH,17-dec-1980 00:00:00,800
PL/SQL procedure successfully completed.

SQL> exec my_procedure(p_ename=>'a')
select * from emp 
where 1 = 1  
and ename like
sys_context( 'MY_CTX', 'ENAME' )
ALLEN,20-feb-1981 00:00:00,1600
JAMES,03-dec-1981 00:00:00,950
PL/SQL procedure successfully completed.

And so on. Because this question and answer is the most-read, it is also one of the largest pages on Ask Tom. 

Subqueries or Joins?

In terms of database performance, which one is better—joins or subqueries? Can you explain with an example? Also, what is the difference between nested loops and hash joins, and how do you determine which one to use for better performance?

Well, in general, joins and subqueries are semantically different. They may return entirely different results and are not interchangeable. What you should do to choose is 

  • Use a subquery when you need no columns from the tables referenced in the subquery

  • Use a join when you do need some of the columns

For example 

select * 
  from emp 
 where deptno in 
( select deptno 
    from dept );

would be "better" than

select emp.* 
  from emp, dept 
 where emp.deptno 
    = dept.deptno;

But for purely semantic reasons, the first query is more meaningful. It says "get me every row from EMP such that EMP.DEPTNO is in the DEPT table." The second query says "join EMP to DEPT." You have to read more into the query yourself to understand its goal (the question being asked). To the optimizer, those two particular queries are identical, and the performance will be the same.

And remember, a subquery cannot simply be replaced by a join (and vice versa), because they often result in different answers . Consider: 

SQL> select *
  2       from dept
  3      where deptno in
  4     (select deptno
  5       from emp)
  6    /

------   -------    ------                                     
    30   SALES      CHICAGO

SQL>  select dept.*
  2        from dept, emp
  3      where dept.deptno
  4            = emp.deptno
  5     /

------   -------    ------
          30  SALES           CHICAGO
14 rows selected.

Next Steps

 ASK Tom
Tom Kyte answers your most difficult technology questions. Highlights from that forum appear in this column.

 READ more Tom
Oracle Database Concepts 11g Release 2 (11.2)
Expert Oracle Database Architecture: Oracle Database Programming 9I, 10g, and 11g Techniques and Solutions, Second Edition

 READ more about analytics
"On Format, Negation, and Sliding"

 Performance Tuning Guide

what language to use

 DOWNLOAD Oracle Database 10g Express Edition (Oracle Database XE)


The optimizer knows what to do—use the proper construct based on the question being asked. Subqueries and joins are not interchangeable in general; use the one that conveys the most meaning.

As for the access paths—the join types—the Performance Tuning Guide at covers these topics: 

What Language to Use

There are popular questions, and there are just plain "hot" questions. By hot I mean questions that gather lots of heated feedback. Recently, I was asked:

I have a client using PL/SQL for both back-end database code and front-end presentation. I think it's more appropriate to use 3GL languages, such as Java and .NET, for the presentation and business logic tier, and to reserve PL/SQL for the data-intensive processes. What's your opinion on this?

I found the question a bit ironic, given that the site where the person asked this particular question (Ask Tom) uses PL/SQL for both back-end database code and front-end presentation via Oracle Application Express (formally known as Oracle HTML DB).

You can probably guess the gist of my answer, which was: There is more than one language, and it can make good sense to use PL/SQL entirely, or Java, or .NET, or whatever language happens to lend itself best to the task at hand. A good back-and-forth discussion followed. 


Tom Kyte Headshot

Tom Kyte is a database evangelist in Oracle’s Server Technologies division and has worked for Oracle since 1993. He is the author of Expert Oracle Database Architecture (Apress, 2005, 2010) and Effective Oracle by Design (Oracle Press, 2003), among other books.


Send us your comments