TECHNOLOGY: Ask Tom
On Rescue Analytics and PopularityBy Tom Kyte
Our technologist explains the saving power of analytics and shares popularity.
ROW_NUM is the primary key of this table, and I need to update COLA and COLB to fill up the nulls with the previous non-null values. After the update, SELECT * should give the following results:
SQL> select * from t1; ROW_NUM COLA COLB --------- ---------- -------- 1 Category 1 Mango 2 Category 1 Mango 3 Category 1 Mango 4 Category 1 Banana 5 Category 1 Banana 6 Category 1 Banana 7 Category 2 Vanilla 8 Category 2 Vanilla 9 Category 2 Strawberry 9 rows selected.
Suppose I have the following table:
SQL> select * from t1; ROW_NUM COLA COLB --------- ---------- -------- 1 Category 1 Mango 2 3 4 Banana 5 6 7 Category 2 Vanilla 8 9 Strawberry 9 rows selected.
Is this possible with one SQL statement, or do I need to write a stored procedure?
Using analytic functions, you can materialize this data easily. I'll show both approaches to carrying down the last non-null value. This is a problem people have to solve many times in a data warehouse when performing time series analysis against sparse data.
In this case, the Oracle Database 10g Release 2 query would look like this:
SQL> select 2 row_num, 3 last_value(cola ignore nulls) 4 over (order by row_num) cola, 5 last_value(colb ignore nulls) 6 over (order by row_num) colb 7 from t1 8 order by row_num 9 / ROW_NUM COLA COLB --------- ---------- -------- 1 Category 1 Mango 2 Category 1 Mango 3 Category 1 Mango 4 Category 1 Banana 5 Category 1 Banana 6 Category 1 Banana 7 Category 2 Vanilla 8 Category 2 Vanilla 9 Category 2 Strawberry 9 rows selected.
That release added the IGNORE NULLS clause for certain analytic functions, such as LAST_VALUE (which retrieves the last value of a given column in an analytic window). With IGNORE NULLS, you are able to retrieve the last non-null value of any given column in an analytic window, effectively allowing you to carry the value down and populate that column. Now, in earlier releases you did not have this capability and had to be a bit more creative in your approach; an equivalent query (assuming ROW_NUM is a 10-digit positive number) in an earlier release would have been
select row_num, substr( max( case when cola is not null then to_char(row_num,'fm0000000000') ||cola end ) over (order by row_num), 11 ) cola, substr( max( case when colb is not null then to_char(row_num,'fm0000000000') ||colb end ) over (order by row_num), 11 ) colb from t1 order by row_num /
If you are interested in understanding how that works, you can review oracle.com/technetwork/issue-archive/o24asktom-095715.html, "On Format, Negation, and Sliding," an earlier Ask Tom column in which I used a similar carry-down technique to group data.
Once I have that query, updating the original source data becomes as easy as this merge:
SQL> merge into t1 2 using ( 3 select 4 row_num, 5 last_value(cola ignore nulls) 6 over (order by row_num) cola, 7 last_value(colb ignore nulls) 8 over (order by row_num) colb 9 from t1 10 ) t2 11 on (t1.row_num = t2.row_num) 12 when matched 13 then update 14 set cola = t2.cola, 15 colb = t2.colb; 9 rows merged.
That again uses an Oracle Database 10g feature: a merge with just the UPDATE component. In earlier releases, I would have had to supply not only a WHEN MATCHED clause but also a WHEN NOT MATCHED clause. In this case, because the USING set of data is based entirely on the data I am merging into, I know that the WHEN NOT MATCHED clause will never happen (because there cannot be any ROW_NUM in T2 that is not in T1). So I can just use a dummy WHEN MATCHED that tries to insert a NULL into ROW_NUM:
merge into t1 using ( second query from above ) t2 on (t1.row_num = t2.row_num) when matched then update set cola = t2.cola, colb = t2.colb when not matched then insert (row_num) values (NULL);
And that does it.
The Most Popular Answer Ever
It was first posted more than five years ago, and it is the most-read question and answer on Ask Tom (asktom.oracle.com)—with almost a quarter of a million views as of this writing. Here it is:
I want to declare multiple cursors based on the values passed through a procedure, and only the WHERE conditions of the cursors will change. The body of the procedure is the same for all the cursors otherwise.
This sounds like a good use of ref cursors to me. Suppose you wanted to build a generic routine that would look at the inputs passed to it and build a WHERE clause for each NON-NULL parameter passed. This would result in a large number of statically defined cursors, so you would use a ref cursor instead, allowing you to do this dynamically.
I'll demonstrate below. I'll write a routine that will print out some EMP data. This routine will take up to three inputs to constrain the result set. I want to have up to eight different cursors possible here:
Additionally, because the use of bind variables is one of the most important things in Oracle programming, I'll want to make sure I use them as well. This will be tricky, because I don't know if I'll have zero, one, two, or three of them until runtime. I'll use an application context to solve that problem.
Here is a sample implementation:
SQL> create or replace 2 context MY_CTX 3 using MY_PROCEDURE 4 / Context created.
That created my application context and bound it to my yet-to-be-created procedure MY_PROCEDURE. Note that only MY_PROCEDURE will be able to set values in this context. Now for convenience I'll wrap DBMS_OUTPUT.PUT_LINE in a small routine. This is to handle strings larger than 255 characters (not necessary in Oracle Database 10g Release 2, in which the line size limit is increased to 32K).
SQL> create or replace procedure p ( p_str in varchar2 ) 2 is 3 l_str long := p_str||chr(10); 4 l_piece long; 5 n number; 6 begin 7 loop 8 exit when l_str is null; 9 n := instr( l_str, chr(10) ); 10 l_piece := substr( l_str, 1, n-1 ); 11 l_str := substr( l_str, n+1 ); 12 loop 13 exit when l_piece is null; 14 dbms_output.put_line ( substr( l_piece, 1, 250)); 15 l_piece := substr( l_piece, 251 ); 16 end loop; 17 end loop; 18 end; 19 / Procedure created.
I use this P routine later to dump the dynamically generated query so I can see what was built for each execution. It is not really relevant to the example, just part of the demonstration.
Now for the "meat" of the answer. Here's the routine that dynamically constructs a predicate for us:
create or replace procedure my_procedure ( p_ename in varchar2 default NULL, p_hiredate in date default NULL, p_sal in number default NULL) as type rc is REF CURSOR; l_cursor rc; l_query varchar2(512) default 'select * from emp where 1 = 1 '; cursor l_template is select * from emp; l_rec l_template%rowtype;
Here I use what I call "template" cursors. I like to use these with dynamically opened ref cursors. I use them to define a record to fetch into. Here, in this simple example, I could have skipped it and just defined l_rec as EMP%rowtype, but I wanted to show how this would work if I didn't issue SELECT * on a single table but on many columns from many tables. This just helps me create a nice record type for PL/SQL. The template query has only a SELECT and a FROM. I never put a WHERE clause on it (even when joining), because I never actually open it. I just use it to get the default datatypes, names, and so on for a record definition right below it. Also, note the where 1 = 1 trick. That is so I can just append zero, one, or more predicates to this query without having to figure out if I need to append the WHERE condition or the AND condition. And because I started the WHERE clause, I just add AND conditions. Note that if you are joining multiple tables, you'll already have a predicate (using the old-style join conditions) and won't need the where 1 = 1 trick.
begin if ( p_ename is NOT NULL ) then dbms_session.set_context ( 'MY_CTX', 'ENAME', '%'||upper(p_ename)||'%'); l_query := l_query || ' and ename like sys_context( ''MY_CTX'', ''ENAME'' ) '; end if;
The technique I'm using here is that for each input, I inspect it to see if it is non-null. If it is, I add it to the WHERE clause and set the value in the context. Note how in the WHERE clause, I always use the SYS_CONTEXT function. I never put the literal value into the query—that would be very bad for
To read about SQL injection, you can refer to oracle.com/technetwork/issue-archive/2005/05-jan/o15asktom-084959.html, an earlier Ask Tom column on that subject.
Also, note how I had to double the quote marks to get a single quote mark in the character string literal. In Oracle Database 10g Release 1 and later, I could use the new quoting method for string literals introduced in that release:
l_query := l_query || q'| and ename like sys_context( 'MY_CTX', 'ENAME' ) |';
To provide a response that works in all current releases of Oracle Database, I'll use Oracle9i Database and the earlier approach of using double quote marks in the remaining text. Now, continuing on, I process the HIREDATE column:
if ( p_hiredate is NOT NULL ) then dbms_session.set_context ( 'MY_CTX', 'HIREDATE', to_char(p_hiredate, 'yyyymmddhh24miss')); l_query := l_query || ' and hiredate > to_date( sys_context( ''MY_CTX'', ''HIREDATE'' ), ''yyyymmddhh24miss'') '; end if;
Note here how I am careful to preserve the date and time component (you are the only one who knows if this is necessary for your application). Also, always wrap the SYS_CONTEXT call in a TO_DATE call when you are comparing with a DATE, to avoid implicit conversions in the query at runtime. Last, I process the third column, SAL:
if ( p_sal is NOT NULL ) then dbms_session.set_context ( 'MY_CTX', 'SAL', p_sal); l_query := l_query || ' and sal > to_number( sys_context( ''MY_CTX'', ''SAL'' ) ) '; end if;
Note the explicit conversion for the NUMBER here. Use TO_NUMBER to avoid implicit conversions. Now I am ready to debug the query, using my P routine, and open the ref cursor:
p( l_query ); open l_cursor for l_query; loop fetch l_cursor into l_rec; exit when l_cursor%notfound; dbms_output.put_line ( l_rec.ename || ',' || l_rec.hiredate || ',' || l_rec.sal ); end loop; close l_cursor; end;
And that's it. I now have a routine that will open one of eight possible different cursors. Here is a small test run just to see how it works:
SQL> exec my_procedure select * from emp where 1 = 1 SMITH,17-dec-1980 00:00:00,800 . . . KING,,5 PL/SQL procedure successfully completed. SQL> exec my_procedure(p_ename=>'a') select * from emp where 1 = 1 and ename like sys_context( 'MY_CTX', 'ENAME' ) ALLEN,20-feb-1981 00:00:00,1600 . . . JAMES,03-dec-1981 00:00:00,950 PL/SQL procedure successfully completed.
And so on. Because this question and answer is the most-read, it is also one of the largest pages on Ask Tom.
Subqueries or Joins?
In terms of database performance, which one is better—joins or subqueries? Can you explain with an example? Also, what is the difference between nested loops and hash joins, and how do you determine which one to use for better performance?
Well, in general, joins and subqueries are semantically different. They may return entirely different results and are not interchangeable. What you should do to choose is
select * from emp where deptno in ( select deptno from dept ); would be "better" than select emp.* from emp, dept where emp.deptno = dept.deptno;
But for purely semantic reasons, the first query is more meaningful. It says "get me every row from EMP such that EMP.DEPTNO is in the DEPT table." The second query says "join EMP to DEPT." You have to read more into the query yourself to understand its goal (the question being asked). To the optimizer, those two particular queries are identical, and the performance will be the same.
And remember, a subquery cannot simply be replaced by a join (and vice versa), because they often result in different answers . Consider:
SQL> select * 2 from dept 3 where deptno in 4 (select deptno 5 from emp) 6 / DEPTNO DNAME LOC ------ ------- ------ 10 ACCOUNTING NEW YORK 20 RESEARCH DALLAS 30 SALES CHICAGO SQL> select dept.* 2 from dept, emp 3 where dept.deptno 4 = emp.deptno 5 / DEPTNO DNAME LOC ------ ------- ------ 10 ACCOUNTING NEW YORK 10 ACCOUNTING NEW YORK 20 RESEARCH DALLAS 20 RESEARCH DALLAS 20 RESEARCH DALLAS . . . 30 SALES CHICAGO 14 rows selected.
The optimizer knows what to do—use the proper construct based on the question being asked. Subqueries and joins are not interchangeable in general; use the one that conveys the most meaning.
As for the access paths—the join types—the Performance Tuning Guide at download.oracle.com/docs/cd/B19306_01/server.102/b14211/toc.htm covers these topics:
What Language to Use
There are popular questions, and there are just plain "hot" questions. By hot I mean questions that gather lots of heated feedback. Recently, I was asked:
I have a client using PL/SQL for both back-end database code and front-end presentation. I think it's more appropriate to use 3GL languages, such as Java and .NET, for the presentation and business logic tier, and to reserve PL/SQL for the data-intensive processes. What's your opinion on this?
I found the question a bit ironic, given that the site where the person asked this particular question (Ask Tom) uses PL/SQL for both back-end database code and front-end presentation via Oracle Application Express (formally known as Oracle HTML DB).
You can probably guess the gist of my answer, which was: There is more than one language, and it can make good sense to use PL/SQL entirely, or Java, or .NET, or whatever language happens to lend itself best to the task at hand. A good back-and-forth discussion followed.
Tom Kyte is a database evangelist in Oracle’s Server Technologies division and has worked for Oracle since 1993. He is the author of Expert Oracle Database Architecture (Apress, 2005, 2010) and Effective Oracle by Design (Oracle Press, 2003), among other books.