As Published In
Oracle Magazine
September/October 2004

TECHNOLOGY: Ask Tom


On Fetching, Storing, and Indexing

By Tom Kyte Oracle Employee ACE

Our technologist fetches sequentially, stores inline, and indexes globally.

We've got a problem concerning an ORA-01002 error in a PL/SQL block. We learned that this error sometimes happens under special circumstances when updating selected data, such as when using a SELECT for an UPDATE. But this is not the case here. The strange thing is that we're getting this error only when reading data from a cursor, without any updates on the cursor data. The error occurs when executing a rollback to a savepoint statement: 

create table test (a integer);
insert into test values (1);
create table test2 (b integer);
insert into test2 values (11);

declare
   cursor c is select * from test;
begin
   update test2 set b = 22; 
   savepoint my_savepoint;
   for x in c
   loop
      -- do some work, but if we
      -- hit an error then:
      rollback to 
         savepoint my_savepoint;
   end loop;
end;
/


Our logic is complex; we are processing a result set, and in the loop we have other processing. If we encounter an error, we want to undo all of our work to date. But we've discovered that when we roll back to the savepoint, we always fail with error ORA-01002 "Fetch out of Sequence." We are using Oracle8i Release 3 (8.1.7), and up to now we couldn't find a similar problem in any forum, so any hints are welcome.

This makes sense, and, hopefully, you will agree. Your timeline was: 

  • At time t0, you did an update—you started a transaction.

  • At time t1, you claimed a "savepoint."

  • At time t1+, you could have modified the database.

  • At time t2, you opened the result set.

This result set can see the database "as of time t2, including any and all changes you made in your session—everything that happened from t0 on to t2." You could have modified the table TEST , for example. You didn't, but there is nothing that says you couldn't have.

At time t3, you say, "Put the data back the way it was at time t1." It is as if t1+ did not happen; in your case, it didn't happen, but it could have. Oracle doesn't know if it did—Oracle doesn't keep track.

So, the rollback to savepoint would have wiped out records that your cursor in theory should and would be able to see. It is like flashing back in a way. You invalidated that cursor. If that cursor remained open, it could well see records that don't, didn't, and won't ever exist.

The general rule is: any rollback to a savepoint that happened before the open of the cursor will cause this error. If this rollback didn't cause the error, inconsistent results would abound.

The solution to your coding dilemma is to open the cursor before the savepoint, as in the following: 

SQL> declare
  2    cursor c is select * from test;
  3    l_rec test%rowtype;
  4  begin
  5    update test2 set b = 22;
  6    open c;
  7    savepoint my_savepoint;
  8    loop
  9      fetch c into l_rec;
 10      exit when c%notfound;
 11      rollback to 
 12         savepoint my_savepoint;
 13    end loop;
 14  end;
 15  /

PL/SQL procedure successfully completed.


This will function as you expect. The result set pointed to by cursor C is consistent with respect to a point in time prior to the savepoint, so rolling back to the savepoint will not affect that cursor at all.

It is very interesting to note that in Oracle 10g, your original code would appear to work in some cases. The test case you provided works straight away (it does not return the ORA-01002 error). This is because Oracle 10g includes an optimization in PL/SQL that silently turns your single row fetches into BULK COLLECT s of 100 rows each. That is, PL/SQL is silently fetching as many as 100 rows on the first fetch and then doling them out as you request them. Therefore, the entire result set in your test case is actually retrieved from the database with the first fetch call, and subsequent fetches just retrieve from the PL/SQL memory cache. I can show that the overall behavior does not change with Oracle 10g (that of invalidating the cursor) by adding 100 more rows to the TEST table. I'll demonstrate by running your example as supplied in Oracle 10g and then again after adding 100 rows: 

ops$tkyte@ORA10G> declare
  2   cursor c is select * from test;
  3   l_rec test%rowtype;
  4  begin
  5   update test2 set b = 22;
  6   savepoint my_savepoint;
  7   for x in c
  8   loop
  9      rollback to
 10         savepoint my_savepoint;
 11   end loop;
 12  end;
 13  /

PL/SQL procedure successfully completed.
 
ops$tkyte@ORA10G> insert into test
  2  select rownum
  3    from all_objects
  4   where rownum <= 100;

100 rows created.
 
ops$tkyte@ORA10G> declare
  2   cursor c is select * from test;
  3   l_rec test%rowtype;
  4  begin
  5   update test2 set b = 22;
  6   savepoint my_savepoint;
  7   for x in c
  8   loop
  9      rollback to
 10         savepoint my_savepoint;
 11   end loop;
 12  end;
 13  /

declare
* 
ERROR at line 1:
ORA-01002: fetch out of sequence
ORA-06512: at line 7


Here it was the second "real" fetch—after the first 100 rows were exhausted in the PL/SQL memory cache—that raised the error.

CLOB or VARCHAR2s

One of the database tables I am working on may exceed 4,000 bytes, but it doesn't happen a lot. I am thinking that instead of using one CLOB, I can probably use two VARCHAR2(4000) columns. I think it may improve performance. What do you think?

I think if it may exceed 4,000 bytes, soon it may exceed 8,000.

I think a CLOB would be the correct approach. A CLOB up to 4,000 bytes will be stored inline as a VARCHAR2. If and when it exceeds that, the CLOB will be moved out of line. Hence, in your most common case, when the field is less than 4,000 bytes, the CLOB will be just like a VARCHAR2 in disguise.

In the infrequent case, it will be a CLOB stored out of line.

So later, when you need to index it with TEXT to search it, you have a single field that is easily indexed. And when it is decided to allow for 16K bytes of text, you are ready to go; there's no code to change. Later, when you are not working on this application, the coders that follow you will have a simple, single field to deal with.

Local or Global Index on Primary Key

I have a table of accounts that has 80 million records. I would like to partition the table by the ACCT_BY_DATE column. I will be going with a range partition and global indexes. My concern is regarding the primary key ACCT_ID. Should the index that will be created for the primary key be a local or a global index?

The index used to enforce the primary key can be local if and only if the partition key is, in fact, contained in the primary key. Otherwise, you will receive an ORA-14039 error (partitioning columns must form a subset of key columns of a unique index).

In this case, you are partitioning by ACCT_BY_DATE , but the primary key is ACCT_ID . The index used for the enforcement of that primary key must be a global index; there is no chance of using a local index.

If you think about the structure of a local index, the value ACCT_ID = 55 could be in any partition. Hence, any insert would have to check all local partitions for 55 and lock each index partition to ensure that no one even tries to insert ACCT_ID = 55 right now. It would be a huge scalability inhibitor, because only one person at a time could insert, update, or delete the primary key.

In a globally partitioned index, on the other hand, the value 55 can be in exactly one partition. Treat that partition the way you would any normal index, and you can achieve multiple insert, update, and delete operations at the same time.

Query Plans with Temporary Tables

I have two queries: 

  • Query 1: I have a query between a global temporary table and a table with 3 million records. This query takes about seven seconds to give the results.

  • Query 2: I have a second query between a normal table and the table with 3 million records. This query takes less than one second to give the results.

When doing an explain plan on the query with a global temporary table (Query 1), I found out that a full scan is being done on the table with 3 million records. The full scan does not happen on the query with the normal table (Query 2).

How can I reduce the time from seven seconds in Query 1? Why is a full scan being done on the table with 3 million records when it is joined with a global temporary table, and how can this be avoided?

The root cause here is the optimizer's lack of statistical information. By default, the optimizer will assume that there are N rows in a global temporary table (N is 8,168 in an 8K block size database). Since it would be rare for your global temporary table to have 8,168 rows in real life, you need to give a hand to the optimizer and provide it with realistic statistics. Before you do that, let's see how many rows the temporary table is assumed to have by using autotrace in SQL*Plus: 

9iR2> create global temporary table
  2  gtt ( x int );

Table created. 
9iR2> set autotrace traceonly explain
9iR2> select /*+ first_rows */ *
  2    from gtt;
 
Execution Plan
---------------------------------------
SELECT STATEMENT 
(Cost=11 Card=8168 Bytes=106184)
  TABLE ACCESS (FULL) OF 'GTT' 
  (Cost=11 Card=8168 Bytes=106184)


It is interesting to note that, in Oracle 10g, you will observe something very different by default in the EXPLAIN plan: 

10G> select *
  2    from gtt;
 
Execution Plan
---------------------------------------
SELECT STATEMENT 
(Cost=2 Card=1 Bytes=13)
  TABLE ACCESS (FULL) OF 'GTT'
  (Cost=2 Card=1 Bytes=13)


The underlying reason Oracle 10g is getting a much better estimate of the true size of the global temporary table is going to be one of your three solution choices.

In Oracle 10g, since the Cost-Based Optimizer (CBO) is the only optimizer, it is much more important to have correct statistics. Therefore, the database employs a technique called dynamic sampling, first introduced with Oracle9i Release 2. Dynamic sampling permits the optimizer to take a quick look at the table when statistics are missing. It will sample the data in the table to come up with better estimates of what it is dealing with.

So the three solutions available to you are 

  • Using dynamic sampling

  • Using DBMS_STATS.SET_TABLE_STATS

  • Using the CARDINALITY hint

I'll demonstrate how to use each in turn.

In Oracle 10g, dynamic sampling will work out-of-the-box, because the default setting has been increased from 1 to 2. At level 2, the optimizer will dynamically sample any unanalyzed object referenced in a query processed by the optimizer prior to evaluating the query plan. I can use an ALTER SESSION|SYSTEM command in Oracle9i Release 2 to make it behave the way Oracle 10g does by default, or I can use the dynamic sampling hint as follows: 

9iR2> select /*+ first_rows
  2    dynamic_sampling(gtt 2) */ *
  3    from gtt;
 
Execution Plan
----------------------------------
SELECT STATEMENT
(Cost=11 Card=1 Bytes=13)
  TABLE ACCESS (FULL) OF 'GTT' 
  (Cost=11 Card=1 Bytes=13)


There I set the dynamic sampling to level 2 for the table GTT in this query. The optimizer therefore quickly scans the table to come up with more-realistic estimates of the true size of this table. The following example adds rows to the GTT table and runs the dynamic sampling hint to have the optimizer sample the unanalyzed objects referenced in the query. Note the increased Card= values: 

9iR2> insert into gtt
  2  select rownum
  3    from all_objects;

32073 rows created.
 
9iR2> set autotrace traceonly explain
9iR2> select /*+ first_rows
  2     dynamic_sampling(gtt 2) */ *
  3    from gtt;
 
Execution Plan
------------------------------------
SELECT STATEMENT
(Cost=11 Card=32073 Bytes=416949)
  TABLE ACCESS (FULL) OF 'GTT' 
  (Cost=11 Card=32073 Bytes=416949)


Another solution, useful in versions prior to Oracle9i Release 2, is to use DBMS_STATS to set representative statistics on the global temporary table. You can do this after the table is created, or you can do it after filling the global temporary table with data (and hence you'll know how many rows it has). If the temporary table is generally the same size from run to run, you would want to set this value just once and be done with it. If the temporary table sizes vary widely, you might consider setting the statistics after populating the table. Since DBMS_STATS implicitly commits (it is like DDL in that sense), you need to be careful how you use it. The following example demonstrates using an AUTONOMOUS_TRANSACTION to permit the use of DBMS_STATS without committing your current transaction's work: 

9iR2> declare
  2   pragma autonomous_transaction;
  3  begin
  4   dbms_stats.set_table_stats
  5   ( user, 'GTT', 
  6     numrows=> 12345 );
  7   commit;
  8  end;
  9  /

PL/SQL procedure successfully completed.
 
9iR2> set autotrace traceonly explain
9iR2> select * from gtt;
 
Execution Plan
------------------------------------
SELECT STATEMENT Optimizer=CHOOSE
(Cost=11 Card=12345 Bytes=160485)
  TABLE ACCESS (FULL) OF 'GTT' 
  (Cost=11 Card=12345 Bytes=160485)


Note that a commit in this example could well clear out a global temporary table, undoing all of your work on it! That is why the AUTONOMOUS_TRANSACTION is really important here.

Next Steps


ASK Tom
Oracle Vice President Tom Kyte answers your most difficult technology questions. Highlights from that forum appear in this column.

READ

 about in-memory collections

 about DIRECT grants

More Tom
 Effective Oracle by Design

The optimizer now believes that the table GTT has 12,345 rows in it and will use that fact whenever it optimizes a query that references that table.

The third solution is the CARDINALITY hint. I include this because it is the only option when using collection variables (not global temporary tables, but rather in-memory tables contained in a PL/SQL collection). The database will not dynamically sample these, nor are they real tables, so no statistics can be stored about them. The only way to communicate to the database the estimated size of this sort of object is to use this hint, as in the following: 

9iR2> select 
  2  /*+ cardinality( gtt 999 ) */ *
  3  from gtt;
 
Execution Plan
-----------------------------------
SELECT STATEMENT Optimizer=CHOOSE 
(Cost=11 Card=999 Bytes=12987)
  TABLE ACCESS (FULL) OF 'GTT' 
  (Cost=11 Card=999 Bytes=12987)


Here I explicitly told the optimizer how many rows it could expect to find in my global temporary table. Note that the CARDINALITY hint is available only in Oracle9i Release 1 and later releases. See Next Steps for a link to a full example demonstrating this technique with in-memory PL/SQL tables.

What's at the End of the File?

I have a tablespace (LMT) with about 1,000 objects (500 tables and 500 indexes). The tablespace consists of three datafiles (3 * 4GB). After some reorganization, one datafile is almost empty, but resizing this datafile fails because some objects have extents at the end of the datafile. Is there a simple way to find the object(s) to which these extents belong?

This is pretty easy to figure out, actually. All you need is the name of the datafile. Once you have that, you can turn the filename into a FILE_ID and, using that FILE_ID , query DBA_EXTENTS to find all extents allocated in it. If you sort that result set by BLOCK_ID in descending order, the first N rows that come back are the extents at the end of the file—the ones that are making it so you cannot shrink the datafile. It is these objects you would move (using ALTER TABLE MOVE for tables and ALTER INDEX REBUILD for indexes) in order to be able to reclaim that space. The query is 

select *
  from (
select owner, segment_name, 
       segment_type, block_id
  from dba_extents
 where file_id = 
   ( select file_id
       from dba_data_files
      where file_name = :FILE )
 order by block_id desc
       )
 where rownum <= 5


The query finds the five outermost extents in a file. An interesting companion query to go along with this is one I call "maxshrink." It'll generate the ALTER DATABASE commands to shrink datafiles to their smallest possible size: 

column value new_val blksize
select value 
  from v$parameter 
 where name = 'db_block_size'
/

select 'alter database datafile ''' || 
 file_name || ''' resize ' ||
 ceil( (nvl(hwm,1)*&&blksize)/1024/1024 )
 || 'm;' cmd
from dba_data_files a,
     ( select file_id, 
         max(block_id+blocks-1) hwm
         from dba_extents
        group by file_id ) b
where a.file_id = b.file_id(+)
  and 
 ceil(blocks*&&blksize/1024/1024)-
      ceil((nvl(hwm,1)*
      &&blksize)/1024/1024 ) > 0
/ 


Custom Audit

In one of my production environments, after the record is created in a table with valid_ind='Y', some code or program using an UPDATE statement is setting the valid_ind='N' for the same record. Is there any way to find out which OS process or database user or SQL statement is doing this update? We tried our best to check all the code.

That is pretty simple. You can put an AFTER UPDATE, FOR EACH ROW trigger on that table containing code similar to the following: 

if ( :old.valid_ind = 'Y' 
     and :new.valid_ind = 'N' )
then
  insert into my_log
  select sysdate, a.* 
    from v$session a
   where sid = (select sid 
                  from v$mystat 
                 where rownum = 1);
end if;


Note that the owner of the trigger needs a DIRECT grant by SYS on the views V_$SESSION and V_$MYSTAT (the underscores are relevant). See Next Steps for a link that describes the technical reason why this grant must be made directly.

Once that trigger is in place, you'll capture all of the session-level information (including the username, program name, OS terminal used, and so on) in a log table. This table can be created easily as follows: 

create table my_log
as
select sysdate dt, 
       a.* 
  from v$session a
 where 1=0; n


Tom Kyte (thomas.kyte@oracle.com) has worked for Oracle since 1993. Kyte is a vice president in the Oracle Government, Education, and Healthcare group and the author of Effective Oracle by Design (Oracle Press, 2003) and Expert One-on-One: Oracle (Apress, 2003).

Send us your comments