As Published In
Oracle Magazine
July/August 2001

TECHNOLOGY: Ask Tom


Getting Just the Rows I Want

By Tom Kyte Oracle Employee ACE

Our Oracle expert looks for the number of rows searched, the fastest way to go, and how to automate the creation of a materialized view.

I have a cursor with an ORDER BY clause that returns 10,000 rows, but I'm restricting it to 500 rows with ROWNUM. Does the Oracle database store 10,000 rows in memory or open with 500 rows? For example: 

SELECT empno, empnm
FROM   ( SELECT empno, empnm
         FROM  employee
         ORDER BY empno )
WHERE  ROWNUM < 500; 


This is a frequent question—especially with the abundance of Web-based applications trying to "paginate" through result sets without maintaining a state.

Generally, the Oracle database does not store an entire result "in memory." Instead, the database tries to answer the query and return the first row before it gets the last row. Consider this: 

select * from one_billion_row_table;

If the database attempted to store the result set in memory, we would never be able to ask a query like that. Oracle generally answers the query on the fly whenever possible. In the above case—if EMPNO is indexed and EMPNO is NOT NULL—the optimizer will read the index to process the ORDER BY and stop after reading 500 rows. For example: 

ops$tkyte@ORA8I.WORLD> create table emp as
  2  select object_id empno, object_name
ename from all_objects;
Table created. (with 20,000 plus rows)

ops$tkyte@ORA8I.WORLD> desc emp

Name       Null?         Type
-----      -----         ----
EMPNO      NOT NULL      NUMBER
ENAME      NOT NULL      VARCHAR2(30)

ops$tkyte@ORA8I.WORLD> create index emp_idx
on emp(empno);
Index created.

ops$tkyte@ORA8I.WORLD> select empno, ename

  2  from ( select empno, ename from emp
order by empno )
  3  where rownum < 500;

499 rows selected.

SQL_TRACE/TKPROF reports

 

Rows     Row Source Operation
----     --------------------

499      COUNT STOPKEY
499      VIEW
499      TABLE ACCESS BY INDEX ROWID EMP
499      INDEX FULL SCAN (object id 27418) 

So the database starts a full scan of the index, reading the data in order from the table, and stops after 500 rows. It never gets to the other 19,500 rows. If, on the other hand, you did something like this: 

ops$tkyte@ORA8I.WORLD> select *
  2  from ( select ename, count(*) from emp
group by ename )
  3  where rownum < 500;

499 rows selected.

SQL_TRACE/TKPROF shows

 

Rows     Row Source Operation
----     --------------------
499      COUNT STOPKEY
499      VIEW
499      SORT GROUP BY STOPKEY
21933    TABLE ACCESS FULL EMP 

That is, the database would have to answer the subquery first and then select 500 rows from it. In this case, the temporary result set of the subquery would typically be materialized into the temporary segments and then read back from there, depending on the size. 

Finding the Fastest Way

Within a large loop, if I want to commit every 1,000 (or million, or billion) records, which is faster— using mod() and then commit , as in: 

LOOP
    cnt := cnt + 1;
    IF ( mod( cnt, 1000 ) ) = 0 THEN         
       commit;
    END IF;
END LOOP;

or setting up a counter, THEN commit , and then counter := 0, as in: 

LOOP
   cnt := cnt + 1;
   IF cnt = 1000 THEN       
     commit;
     cnt := 0;
   END IF;
END LOOP;

Do you want to commit in the loop in the first place? If you don't, the fastest way is to not even consider committing every n rows. You should configure sufficient rollback and do the update in a single UPDATE statement. Don't use a loop at all. It is slower than an INSERT INTO SELECT or a single UPDATE statement.

 

 

Next Steps

 

 

 Oracle Managing Technologist Tom Kyte answers your most difficult Oracle technology questions in Oracle Publishing Online's forum Ask Tom, at asktom.oracle.com. Highlights from that forum appear in this column. 

 Materialized views were introduced with Oracle8i and are available in Oracle8i Enterprise and Personal editions. For more information on materialized views, see the Oracle8i Data Warehousing Guide on the Oracle Technology Network.

The second-fastest way is to configure sufficient rollback and do the transaction in a single loop without commits. Each time you commit, you must wait for the log-writer process to fully flush the buffers. If you just keep going, LGWR does this in the background—you'll have to wait once at the end instead of every n rows. The commit is what will slow you down. You'll have to consider how to restart this process too. Watch out for ORA-01555, which will inevitably get you if you're updating the table you're reading.

As for the code above, we can use the source-code profiler provided in Oracle8i to analyze this. I created and ran the following procedures: 

create or replace procedure do_mod
as
    cnt number := 0;
begin
    dbms_profiler.start_profiler( 'mod' );
    for i in 1 .. 500000
    loop
        cnt := cnt + 1;
        if ( mod(cnt,1000) = 0 )
        then
            commit;
        end if;
    end loop;
    dbms_profiler.stop_profiler;

end;
/

create or replace procedure no_mod
as
    cnt number := 0;
begin
    dbms_profiler.start_profiler( 'no mod'
);
    for i in 1 .. 500000
    loop
        cnt := cnt + 1;
        if ( cnt = 1000 )
        then
            commit;
            cnt := 0;
        end if;
    end loop;

    dbms_profiler.stop_profiler;
end;
/

Running the profiler reports, I find 

Percentage of time in each module,
summarized across runs

UNIT_OWNER     UNIT_NAME      SECS     PERCENTAG
----------     --------       ----     ---------
OPS$TKYTE      DO_MOD         8.18     71.67
OPS$TKYTE      NO_MOD         3.23     28.32
SYS            DBMS_PROFILER   .00       .00

 Already, this shows that the MOD function takes longer: over 8 seconds in that routine, compared to just over 3 seconds in the routine without the MOD function. To prove that it is the MOD function itself, see Listing 1. Using MOD took about 5.5 seconds, whereas doing if ( cnt=1000 ) took 1 second, plus the time to do cnt := 0, for a total of about 1.5 seconds. 

Automatic Summary Tables

I have two tables, TABLE1 and TABLE2, and I want to create a trigger like: 

create or replace trigger trig_1
after insert on table1
  ....
  update table2 set col1 = ....(based on
sum(col2 of table1))
   where key_condition
  .... 

Within this trigger, I want to update TABLE2. This update is based on COL1 in TABLE1 (on summation of COL1 for TABLE1). If I do select sum(col1) from table1 , TABLE1 mutates. There is no foreign-key relationship, and the COL1 in TABLE2 and COL2 in TABLE1 are not keys in both tables.

This might best be handled by a materialized view—a mechanism whereby the server will maintain summary tables automatically, either upon committing changes to the database, on a recurring (timed) basis, or on demand. This example demonstrates how you might store the rolled-up sum of a column in a detail table in another summary table. We'll have the summary refreshed upon commit to keep the rolled-up summary as current as the last transaction. First, our detail table: 

SQL> create table t1 ( a  int,
  2     b int,
  3     y number,
  4     primary key(a,b) )
  5  /
Table created.

On this table, we will need to keep a materialized view log to capture the changes made to T1 during our transaction. We'll keep a summary of column Y rolled up by the values of column A—hence those columns must appear in our materialized view log: 

SQL> create materialized view log on t1
  2  with rowid ( a, y ) including new
values
  3  /
Materialized view log created.

We are ready for our materialized view: 

SQL> create materialized view t2
  2  build immediate
  3  refresh on commit
  4  as
  5  select a, count(*), count(y), sum(y) from t1 group by a
  6  /
Materialized view created.

We need to maintain more then just A and SUM(Y) in this case, to allow the database to incrementally refresh this view and not have to rebuild it. The counts are small and probably useful to our applications. Load up the table T1: 

SQL> begin
  2      for i in 1 .. 10
  3      loop
  4          for j in 1 .. 10
  5          loop

  6           insert into t1 values (  
              i, j, dbms_random.random );
  7          end loop;
  8      end loop;
  9  end;
10  /
PL/SQL procedure successfully completed.

SQL> commit;
Commit complete.

SQL> select t2.a, t2.sum_y,
  2     (select sum(y) from t1 where t1.a =
t2.a ) sum_t1_y
  3     from t2x t2
  4  /

A  SUM_Y        SUM_T1_Y
-  -----        --------
1  -2.985E+09   -2.985E+09
2  -1.788E+09   -1.788E+09
3  -38754027    -38754027
4  -4.144E+09   -4.144E+09

5  164759646    164759646
6  2717491103   2717491103
7  6081803620   6081803620
8  1222195769   1222195769
9  706431010    706431010
10 -6.878E+09   -6.878E+09

10 rows selected.

The materialized view T2 automatically kept SUM(Y) for us—when compared directly to the actual SUM(Y) from T1, the results are the same. We can insert, update, and delete T1; the changes will be reflected in T2 automatically.


Columnist Tom Kyte ( thomas.kyte@oracle.com) has worked with the Oracle Services Industry Technology Group in Reston, Virginia, since 1993.

Send us your comments