As Published In
Oracle Magazine
March/April 2007

TECHNOLOGY: Ask Tom


On Cursors, SQL, and Analytics

By Tom Kyte Oracle Employee ACE

Our technologist referees between cursors, SQL statements, analytics, and aggregates.

I must make a new table based on the data of, for example, three tables. I think that I should use a single query, something like this: 

create table t 
as 
select * 
from t1 left join t2 on t1.field1 = t2.field1

But my boss said that it is most effective to use a cursor like this: 

for cursor in (select * from t1)
loop
  begin
    select * into record 
      from t2 
     where t2.field1=cursor.field1;
    insert into t (...) 
    values (cursor,record);
  exception
    when no_data_found then 
     insert into t (...) 
     values (cursor,nulls);
  end;
end loop;

Which is the correct approach?

Your boss has provided the single most inefficient approach to loading data. I have a name for this type of approach: I call it SLOW by SLOW processing . Avoid it. In fact, this procedural approach goes beyond normal SLOW by SLOW processing and takes it to a new level—it has introduced a "do-it-yourself nested loops join." I would seriously consider getting a new mentor for yourself; you'll be better for it long-term!

I have a pretty simple mantra when it comes to developing database software, and I have written this many times over the years:  

  • You should do it in a single SQL statement if at all possible.

  • If you cannot do it in a single SQL statement, do it in PL/SQL.

  • If you cannot do it in PL/SQL, try a Java stored procedure.

  • If you cannot do it in Java, do it in a C external procedure.

  • If you cannot do it in a C external procedure, you might want to seriously think about why it is you need to do it.

If you can do it in a single SQL statement, by all means do it in a single SQL statement. Do not waste time, energy, and CPU cycles writing procedural code that will run slower than regular SQL.

This question comes up frequently on Ask Tom, and my answer has always been very consistent. If you don't have to write procedural code, don't. If you can erase many lines of procedural code and replace it with a single SQL statement, do so. Less code = fewer bugs; more code = more bugs . Remove bugs by utilizing SQL.

Analytics Versus Aggregates

Are analytical functions a replacement for aggregate functions? Some of the developers here are hurrying to replace the GROUP BY aggregate functions with analytical functions, just because they learned something new. Are analytical functions always faster than aggregate functions?

I am a huge fan of analytic functions—I've often said that analytics are the greatest thing to happen to SQL since the SELECT keyword! However, it does not make any sense at all to replace aggregate functions with analytics—none at all—because 

  • Aggregates take many rows and collapse them into one

  • Analytics do not collapse many rows into one—there is no row collapsing

Analytics and aggregates are orthogonal concepts, really. So, no, it is not sensible to replace aggregates with analytics. I don't even know how you would do it (short of using a DISTINCT after computing the analytic result for lots of rows, which would be a bad idea indeed). For example, to show the SUM of SAL by JOB from the EMP table, you would code 

SQL> select job, sum(sal)
  2    from emp
  3    group by job
  4    order by job
  5    /

JOB          SUM(SAL)
----------   ------------
ANALYST              6000
CLERK                4150
MANAGER              8275
PRESIDENT            5000
SALESMAN             5600

Now, if you were to use analytics to accomplish that, you would have 

SQL> select DISTINCT
  2             job,
  3             sum(sal) over 
       (partition by job) sumsal 
  4      from emp
  5   order by job
  6  /

JOB          SUM(SAL)
----------   ------------
ANALYST              6000
CLERK                4150
MANAGER              8275
PRESIDENT            5000
SALESMAN             5600

It gives the same answer, but at a price. If you compare the plans for each query, shown in Listing 1, you'll see that the analytic function performs

Code Listing 1: Plan performance of analytics and aggregation 

SQL> select DISTINCT
  2             job,
  3             sum(sal) over (partition by job) sumsal
  4       from emp
  5     order by job
  6    /

Execution Plan
---------------------------------------------------------------------------------
| Id  | Operation            |  Name  | Rows  | Bytes | Cost (%CPU)  |     Time  |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |   14  |  168  |    4 (25)    |  00:00:01 |
|   1 |  SORT UNIQUE         |        |   14  |  168  |    3  (0)    |  00:00:01 |
|   2 |   WINDOW SORT        |        |   14  |  168  |    4 (25)    |  00:00:01 |
|   3 |    TABLE ACCESS FULL |  EMP   |   14  |  168  |    3  (0)    |  00:00:01 |
---------------------------------------------------------------------------------

SQL> select job, sum(sal)
  2       from emp
  3     group by job
  4     order by job
  5    /

Execution Plan
--------------------------------------------------------------------------------
| Id  | Operation           |  Name  |  Rows  | Bytes | Cost (%CPU)  |    Time  |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |        |     5  |    60 |     4  (25)  | 00:00:01 |
|   1 |  SORT GROUP BY      |        |     5  |    60 |     4  (25)  | 00:00:01 |
|   2 |   TABLE ACCESS FULL |   EMP  |    14  |   168 |     3   (0)  | 00:00:01 |
--------------------------------------------------------------------------------

 

  • A full scan of EMP

  • A window sort—to partition the data by job, so it can add the SUM(SAL) to each and every row in the entire EMP table

  • A unique sort to remove the many duplicate rows, to return a single row per JOB

The SQL statement using aggregation, on the other hand, simply performs 

  • A full scan of EMP

  • A SORT GROUP BY to add up the salaries by JOB and output the results

So please use aggregates to aggregate and do not use analytics with DISTINCT to accomplish aggregation!

Varying IN Lists

"How do I handle varying IN lists?" is a frequently asked question on asktom.oracle.com, so I've decided to write up the various ways to do this.

First the problem statement: You have a string, and it looks like 

  • 1, 2, 3, 4

or 

  • 'A', 'B', 'C'

or something similar. You would like to retrieve all rows from some table/query such that some column is in that string. That is, you would like to execute 

SELECT * FROM t WHERE x IN (1,2,3,4)

or 

SELECT * FROM t WHERE y IN ('A','B','C')

You would like to use bind variables (because you've heard through the grapevine that bind variables are good). However, using bind variables as shown does not seem to work for you: 

SQL> variable txt varchar2(25)
SQL> exec :txt := ' ''SYS'', ''SYSTEM'' '
PL/SQL procedure successfully completed.

SQL> print txt

TXT
-----------------
'SYS', 'SYSTEM'

SQL> select *
  2    from all_users
  3   where username in (:txt);
no rows selected

SQL> select *
  2    from all_users
  3   where username in ('SYS','SYSTEM');

USERNAME   USER_ID    CREATED
--------   -------    ---------
SYS              0    30-JUN-05
SYSTEM           5    30-JUN-05

When you bind the IN list, the query returns no data; when you hard-code the IN list, however, the query returns data. The reason should be clear: the previous example, using the bind variable, is equivalent to the following query with literals: 

SQL> select *
 2    from all_users
 3   where username 
     in ( ' ''SYS'', ''SYSTEM'' ' );

no rows selected

There is a single string, a single value in that IN list. What you need to do is turn that into a set. Here are some approaches you can take:<,/span>

If you have a finite number of items in the IN list. By all means, just bind the individual elements. That is, suppose you let the user pick as many as 10 items in a pick list. I strongly encourage you to use the query  

select * 
  from all_users 
 where username in 
( :bv1, :bv2, :bv3, 
  ... :bv10 );

And you would not use a single string; you would bind 10 inputs to this query (binding NULLs for any bind variables for which values are not set). This works well for small lists, but it would be quite tedious, obviously, for dozens or hundreds of items.

If you are in Oracle8i. You can use a function that returns a collection. You make it so that you can "query a string"—use the string as if it were a table. You'll need a collection type. I'll just use a table of VARCHAR2(4000) and a PL/SQL function that will parse a delimited string and return it as a collection, as shown in Listing 2.

Code Listing 2: Oracle8i function solution for varying IN lists  

SQL> create or replace type str2tblType as table of varchar2(4000)
2  /
Type created.

SQL> create or replace
2  function str2tbl
3  ( p_str in varchar2,
4    p_delim in varchar2 default ',' )
5  return str2tblType
6  as
7     l_str long default p_str || p_delim;
8     l_n   number;
9     l_data str2tblType := str2tblType();
10  begin
11      loop
12          l_n := instr( l_str, p_delim );
13          exit when (nvl(l_n,0) = 0);
14          l_data.extend;
15          l_data(l_data.count) := ltrim(rtrim(substr(l_str,1,l_n-1)));
16          l_str := substr( l_str, l_n+1 );
17      end loop;
18      return l_data;
19  end;
20  /
Function created.

SQL> column column_value format a10
SQL> exec :txt := 'SYS, SYSTEM'
PL/SQL procedure successfully completed.

SQL> select *
2      from TABLE( cast( str2tbl(:txt) as str2TblType ) )
3     /

COLUMN_VAL
-----------------
SYS
SYSTEM

SQL> select *
2      from all_users
3    where username in
4   (select *
5       from TABLE( cast( str2tbl(:txt) as str2TblType ) )
6   )
7   /

USERNAME  USER_ID    CREATED
--------  -------    ---------
SYS             0    30-JUN-05
SYSTEM          5    30-JUN-05

If you are in Oracle9i Release 2 and above. You can skip the function altogether and just use DUAL to generate rows and parse the string. Consider 

SQL>> with data
 2     as
 3   (select level l
 4      from dual
 5   connect by level <= 5)
 6   select *
 7     from data; 

    L
-------
    1
    2
    3
    4
    5

So, you can use DUAL to generate rows, and then—using SUBSTR/INSTR—you can effectively parse the bind variable and return the ith element from it. Listing 3 shows this process, followed by the complete process with the final SELECT statement with IN (SELECT * FROM data).

Code Listing 3: Oracle9i Release 2 and later solution for varying IN lists  

SQL> exec :txt := 'SYS, SYSTEM'
PL/SQL procedure successfully completed.

SQL> with data
2  as
3  (
4  select
5    trim( substr (txt,
6          instr (txt, ',', 1, level  ) + 1,
7          instr (txt, ',', 1, level+1)
8             - instr (txt, ',', 1, level) -1 ) )
9      as token
10    from (select ','||:txt||',' txt
11            from dual)
12  connect by level <=
13     length(:txt)-length(replace(:txt,',',''))+1
14  )
15  select * from data;

TOKEN
----------------
SYS
SYSTEM

SQL> with data
2  as
3  (
4  select
5    trim( substr (txt,
6          instr (txt, ',', 1, level  ) + 1,
7          instr (txt, ',', 1, level+1)
8             - instr (txt, ',', 1, level) -1 ) )
9      as token
10    from (select ','||:txt||',' txt
11            from dual)
12  connect by level <=
13     length(:txt)-length(replace(:txt,',',''))+1
14  )
15  select *
16    from all_users
17   where username in (select * from data);

USERNAME   USER_ID   CREATED
--------   -------   --------- 
SYSTEM           5   30-JUN-05
SYS              0   30-JUN-05

Now, some people look at that "WITH DATA" bit and say, "That's too much; it's too hard to code that every time." You can use a view to hide the complexity here, use a stored procedure as a way to bind to the view (in effect, a parameterized view), and bind your query's IN list by calling MY_CTX_PROCEDURE—and the view does the rest, as shown in Listing 4.

Code Listing 4: View and procedure solution for varying IN lists 

SQL> create or replace context my_ctx using my_ctx_procedure
2    /
Context created.

SQL> create or replace
2    procedure my_ctx_procedure
3    ( p_str in varchar2 )
4    as
5    begin
6            dbms_session.set_context
7            ( 'my_ctx', 'txt', p_str );
8    end;
9    /
Procedure created.

SQL> create or replace view IN_LIST
2    as
3    select
4      trim( substr (txt,
5            instr (txt, ',', 1, level  ) + 1,
6            instr (txt, ',', 1, level+1)
7               - instr (txt, ',', 1, level) -1 ) )
8        as token
9      from (select ','||sys_context('my_ctx','txt')||',' txt
10              from dual)
11    connect by level <=
12       length(sys_context('my_ctx','txt'))
13         - length(replace(sys_context('my_ctx','txt'),',',''))+1
14    /
View created.

SQL> exec my_ctx_procedure( :txt )
PL/SQL procedure successfully completed.

SQL> select *
2      from all_users
3     where username in
4     (select * from IN_LIST);

USERNAME   USER_ID    CREATED
--------   -------    --------- 
SYSTEM           5    30-JUN-05
SYS              0    30-JUN-05

Automatic PGA Memory Allocation

Can Oracle Database allocate more PGA memory than the PGA_AGGREGATE_TARGET parameter value?

Absolutely. The PGA_AGGREGATE_TARGET init.ora parameter is not a hard limit; it is a soft limit—a goal. Oracle Database attempts to keep PGA memory allocations over all sessions under this target, but there are times when it cannot.

Next Steps



ASK Tom
Oracle Vice President Tom Kyte answers your most difficult technology questions. Highlights from that forum appear in this column.
asktom.oracle.com

DOWNLOAD
Oracle Database 10g Express Edition (Oracle Database XE)

There are two types of PGA memory: "tunable" and "untunable." Oracle Database can control the allocation size of tunable PGA memory, and it includes things such as the sort and hash areas. Untunable PGA memory is allocated by a session Oracle Database cannot control, such as memory for PL/SQL table variables.

Suppose you have 100 sessions connected, and you set the PGA target to 100MB. Now suppose each of those 100 sessions, using dedicated server connections, runs some code that fills up a PL/SQL table variable with data, so the PL/SQL table consumes 5MB of memory. You now have at least 500MB of PGA memory allocated, and Oracle Database can't do anything to keep you within the 100MB limit. This is one way to exceed the PGA target.

Next, suppose you have 100 sessions connected, the PGA target is set to 10MB, and each of those 100 sessions opens 10 cursors that sort (so you have 1,000 open SQL statements). Even if Oracle Database used a tiny sort area of 64K, that would be 64,000K of memory—or 62.5MB. You would exceed your setting once again.

If you are interested in seeing demonstrations of this, my latest book, Expert Oracle Database Architecture: 9 i and 10 g Programming Techniques (Apress, 2005), shows some.

 


Tom Kyte has worked for Oracle since 1993. He is a vice president in the Oracle Public Sector group and the author of the book Effective Oracle by Design (Oracle Press, 2003), among others.


Send us your comments