As Published In
Oracle Magazine
July/August 2004

TECHNOLOGY: Ask Tom


On Numbers and Analytics

By Tom Kyte Oracle Employee ACE

Our technologist searches, numbers, analyzes, and orders.

I have a control table that contains various values: numeric, character, and dates. They are all stored as VARCHARs, but when a value is numeric, can I perform any easy check to verify that a value is a number? Is there a function "isnumeric" that will do this?

I prefer to store dates as dates and numbers as numbers and would probably be looking for a different solution, if this table is going to be of any real size in the future. There are no native "isnumeric"-type functions in Oracle; however, it is very easy to write one. Take the following function, for example: 

SQL> create or replace
  2  function isnumeric
  3  ( p_string in varchar2)
  4  return boolean
  5  as
  6      l_number number;
  7  begin
  8      l_number := p_string;
  9      return TRUE;
 10  exception
 11      when others then
 12          return FALSE;
 13  end;
 14  /
 Function created.


This function will return TRUE if the input string can be converted to a number by use of the current session's NLS settings. For dates, you can get a bit fancier and pass in the format to use: 

SQL> create or replace
  2  function isdate
  3  ( p_string in varchar2,
  4      p_fmt in varchar2 := null)  
  5    return boolean
  6  as
  7      l_date date;
  8  begin
  9      l_date :=
 10       to_date(p_string,p_fmt); 
 11       return TRUE;
 12  exception
 13      when others then
 14          return FALSE;
 15  end;
 16  /
Function created.


Analytics to the Rescue (Again)

I have the following test data: 

select * 
  from test_sort;

T_NM   P_NM   ID
----   ----   --
01t    02P    36
01t    01p    37
03t    02P    38
03t    01p    39
02T    02P    40
02T    01p    41


I would like to run a query that produces data in the following order: 

T_NM   P_NM   ID
----   ----   --  
01t    01p    37
01t    02P    36
03t    01p    39
03t    02P    38
02T    01p    41
02T    02P    40


The sorting is based on the minimum ID by T_NM (36 for every 01t, 38 for every 03t, and 40 for every 02T) and then sorting within that by P_NM. I've tried various subqueries and analytic functions (rank, dense_rank, row_number). It seems so simple, but I can't seem to get it. Any ideas?

The answer here is hidden in the question; you described the sorting as "based on the minimum ID by T_NM ." That is almost the solution in this case. All you need to do is type in this SQL: 

SQL> select t_nm, p_nm, id,
  2    min(id)
  3     over (partition by t_nm)
  4          min_id
  5    from test_sort
  6   order by 4, 2
  7  /
 
T_NM   P_NM   ID   MIN_ID
----   ----   --   ------
01t    01p    37      36
01t    02P    36      36
03t    01p    39      38
03t    02P    38      38
02T    01p    41      40
02T    02P    40      40
 
6 rows selected.


You can now order by the minimum ID by T_NM and then, within that, by the P_NM column. Analytics rock.

Case-Insensitive Searching

We have a third-party application we use for selecting specific values from several tables. The character values stored in a field can be in any of the following formats: Bell, bell, BELL, beLL, and so on. Is there a way to force (change) values (either to upper- or lowercase) during DML other than by using triggers? And if not, is there a way to force a SELECT query (other than using functions UPPER or LOWER) to return all the values, regardless of case?

In Oracle 10g, there is a way to do this. Prior to Oracle 10g, you had to adopt one of the following strategies:  

  • Use a function-based index on UPPER ( column_name ) and modify the queries to use WHERE UPPER ( column_name ) = value.

  • Use a trigger to roll the column value to upper- or lowercase upon modification.

  • Use Oracle Text to create a TEXT index on the column; text indexes (which would then mandate the use of the CONTAINS operator in the predicate) can be case-sensitive or -insensitive.

In each of these cases, one of your conditions would have been violated. You would have needed to use triggers, or UPPER() or CONTAINS in the WHERE clause.

In Oracle 10g, you can do this transparently. Suppose you have the following table of data:  

SQL> select * from t;
 
DATA
--------------
Hello
HeLlO
HELLO


And suppose you run the following query: 

SQL> variable x varchar2(25)
SQL> exec :x := 'hello';
 
SQL> select *
  2    from t
  3      where data = :x;

no rows selected


Nothing in that table matches "hello" in lowercase. Well, with some ALTER SESSION s, you can change all that: 

SQL> alter session
  2  set nls_comp=ansi;
Session altered.

SQL> alter session
  2  set nls_sort=binary_ci;
Session altered.
 
SQL> select *
  2    from t
  3   where data = :x;

DATA
--------------
Hello
HeLlO
HELLO


This is the same query with the same input data, but with a very different result. The effect of these two ALTER SESSION directives is to enable case- insensitive comparisons for =, <, and >.

You can, in fact, take this one step further and even index the data in a case-insensitive fashion, allowing searches on this data to utilize indexes when it makes sense. Consider the following example: 

SQL> create index t_idx on
  2  t( nlssort( data,
  3  'NLS_SORT=BINARY_CI' ) );
Index created.
 
SQL> begin
  2   dbms_stats.set_table_stats
  3   (ownname=>user,
  4    tabname=>'T',
  5    numrows=> 100000);
  6  end;
  7  /


So, I created a function-based index on the DATA column and used the binary case insensitive sort in doing so. Then I faked out the optimizer and told it that the table has lots of rows. Using AUTOTRACE , you can see that the optimizer can and will use an index to access this information: 

SQL> set autotrace on
SQL> set linesize 121
SQL> select *
  2    from t
  3   where data = :x;

DATA
--------------
Hello
HeLlO
HELLO

Execution Plan
------------------------------
SELECT STATEMENT (Cost=2)
 TABLE ACCESS (BY INDEX ROWID) 
   INDEX (RANGE SCAN) OF 'T_IDX'


Removing Duplicates

Please explain how to remove duplicate records quickly in a single run from a large table containing about 5 million records. I tried it with the following query, but it takes 10 hours: 

delete from test1
  where rowid not in (select min(rowid) 
  from test1 group by rc_no); 


Even after increasing the rollback segment tablespace to 7GB, we are not getting the desired results.

If you are using the rule-based optimizer (RBO), there is a very good chance that the NOT IN query is being evaluated once for each row in the outer query! So, hopefully, you are using the much smarter cost-based optimizer (CBO).

But in any case, my approach to removing duplicates is a little different. I would generate the set of rowids to delete by using analytics and then delete them, like this: 

SQL> select count(*), 
  2 count(distinct cust_seg_nbr)
  3 from t;
 
  COUNT(*)   COUNT(CUST_SEG_NBR)
 ---------    -------------------
  1871652                756667


Here you can see that I have more than 1.8 million rows but only 756,667 unique ones. I need to delete about 1.1 million of my rows (lots). Let's see how speedy this can be: 

delete from t
 where rowid in 
(select rid
    from 
   (select rowid rid,
            row_number() over
            (partition by 
                 cust_seg_nbr 
             order by rowid) rn
        from t
    )
  where rn <> 1 
)
1114985 rows deleted.
Elapsed: 00:01:46.06

SQL> select count(*), 
  2  count(distinct cust_seg_nbr)
  3 from t;
  COUNT(*)  COUNT(DISTINCTCUST_SEG_NBR)
 ---------  ---------------------------
    756667                     756667


Now, that was the speed on a laptop computer (so your mileage may vary). Also, in my case, the table was not indexed. Each index is going to add more processing time. Consider disabling your indexes for a mass duplicate removal like this and then rebuilding your indexes afterward. Index maintenance can be very expensive on a large bulk operation in which you plan to delete most of the data.

As for the rollback size, it will necessarily get as big as it needs to be in order to process the delete. Every index you have will make the delete need more space, in addition to making it take more time.

Another very viable option when mass-deleting data is to create a new table that keeps just the right records and drops the old table, as in the following example: 

SQL> select count(*), 
  2  count(distinct cust_seg_nbr)
  3  from t;

  COUNT(*)   COUNT(CUST_SEG_NBR)
 ---------    -------------------
   1871652               756667

create table t2
as
select cust_seg_nbr, ...
  from 
 (select t.*, 
     row_number() over 
       (partition by cust_seg_nbr 
        order by rowid) rn
    from t
  )
 where rn = 1
/
Table created.
Elapsed: 00:00:10.93


As you can see, it took 11 seconds in this example to copy out the rows to keep, versus 1 minute and 46 seconds to remove the rows you don't want, and you can do this without generating any UNDO (rollback) or REDO to boot.

More Numbers

I have a table with a VARCHAR2 field, and I want to query everything from the table where the VARCHAR2 field has a valid number. For example, if my table has the following data: 

name    seq
----    ---
ab      10
cd      1-
ef      1a
gh      12
ij      1.0


I want the query result to be 

ab  10
gh  12


I reject everything that does not have a number format. I will use the query you give me very often to do updates in that table.

Well, this is very similar to the earlier item on the "isnumeric" function. In fact, the solution is just about the same, but I'll add a function-based index to aid in identifying the rows, using an index when appropriate.

What I'll do is write a very small PL/SQL function that is deterministic (meaning that the function always returns the same answer, given the same inputs). In this case, I'll have it return only two values: either 1 (meaning yes, the input was a number) or NULL (meaning no, the input was not a number). 

SQL> create or replace
  2  function is_number
  3  ( p_string in varchar2 )
  4    return number
  5  deterministic
  6  as
  7     l_num number;
  8  begin
  9     l_num := p_string;
 10     return 1;
 11  exception
 12     when others then
 13        return null;
 14  end;
 15  /
Function created.


Now, I can take any one of your tables with this VARCHAR2 column and issue the following:  

                                SQL> create index t_idx   2  on t(is_number(data)); 
                            


If you query SELECT * FROM T WHERE IS_NUMBER(data) = 1 , you'll get all of the rows with valid numbers in them via the index, if appropriate. Note that this index will be very compact, because Oracle B*Tree indexes do not create entirely NULL entries, so any row in the table that does not have a number in it will not be represented in the index.

Order Anything

Unless I hard-code the ORDER BY clause in a hard ref cursor, the ORDER BY gets ignored when I'm fetching from the cursor. Can I achieve dynamic ordering on Oracle8i Release 3 (8.1.7) or any higher version? I can use parameters in all other places I've tried except the ORDER BY. Here is a tiny piece of sample code:  

open p_cursor for
  select ename, empno 
    From emp
  order by 
    decode( variable, 
       '1', 'ENAME', 
       '2', 'EMPNO' )


It is not that the ORDER BY DECODE was ignored; it's that the ORDER BY DECODE you used was akin to 

select * 
  from t 
 order by 'hello world';


That is what you are doing in effect—ordering by a constant string. So the ORDER BY was not ignored—it just was not what you wanted to do!

Here are two methods to achieve this. The first uses DECODE ; the other uses dynamic SQL. Let's take a look at DECODE first. I can write 

order by 
  decode( l_sort_by, 
    '1', ename,
    '2', to_char(empno, '0009'))


This DECODE always returns a character string to sort by. If l_sort_by = 1 , it returns ename to sort or returns the EMPNO formatted as a fixed-width string field so it sorts properly. If you have a date, you will use to_char(dt, 'yyyymmddhh24miss') . If you have numbers that can be negative, you'll need to find a format that sorts them correctly. The important thing is to return a character string that sorts properly for you in all cases.

Next, we can use what I regard as the easier method of dynamic SQL or, if you have just a few columns, a static SQL statement opened in an IF/THEN/ELSIF block. First the dynamic SQL approach: 

begin
 l_query := 'select ...';
 if (l_sort_by = '1') then
   l_query := l_query || 'ename';
 elsif (l_sort_by = '2') then
   l_query := l_query || 'empno';
 ...
 end if;
 open l_cursor for l_query;


The following static SQL example is not too different and would be my choice for a situation with only two or three ORDER BY s (because static SQL is preferable to dynamic SQL, for many reasons): 

begin
  if l_sort_by = '1' 
  then
     open l_cursor 
     for select ... 
     order by ename;
  else
     open l_cursor 
     for select ... 
     order by empno;
  end if;
  loop
    fetch l_cursor into ...;
    exit when l_cursor%notfound;


Reorganizing with Oracle 10g

You mentioned that Oracle Database 10g reallocates space in certain situations, such as the deleting of every other row in a table. Please elaborate.

Online segment shrink is available for tables in ASSM (Automatic Segment Space Management) tablespaces. Conceptually, what happens is that Oracle reads the table from the bottom up, and upon finding rows at the bottom of the table, it deletes them and reinserts them at the top of the table. When it runs out of space at the top, it stops, leaving all the free space at the end—or bottom—of the table. Then Oracle redraws the high-water mark for that table and releases that allocated space. Here is a quick example: 

SQL> create table t
  2  ENABLE ROW MOVEMENT
  3  as
  4  select *
  5    from all_objects;
Table created.


Here I created a table with ENABLE ROW MOVEMENT . Oracle will be physically moving the rows, and this clause gives Oracle permission to change the rowids. Here's what a full scan of this big table does: 

SQL> set autotrace on statistics
SQL> select count(*) from t;
 
   COUNT(*)
-----------
     47266
 
Statistics
-------------------------------
          0  db block gets
        724  consistent gets
        651  physical reads

SQL> set autotrace off


It took 724 logical IOs ( consistent gets ) to read that table and count the rows. A peek at USER_EXTENTS shows the table consuming 768 blocks in 20 extents.

Over time, I perform some deletes on this table, leaving behind lots of white space. I'll simulate that by deleting every other row in the table: 

SQL> delete from t 
  2  where mod(object_id,2) = 0;
23624 rows deleted.


 

Next Steps


ASK Tom
Oracle Vice President Tom Kyte answers your most difficult technology questions. Highlights from that forum appear in this column.

 READ more about function-based indexes

 READ Effective Oracle by Design

Now I want to reclaim this white space, getting it back from the table and perhaps using it for other objects, or maybe I full-scan this table frequently and would just like it to be smaller. Before Oracle Database 10g, the only option was to rebuild it, with EMP/IMP, ALTER TABLE MOVE , or an online redefinition. In Oracle Database 10g, I can compact and shrink it:

 

SQL> alter table t shrink space compact;
Table altered.
 
SQL> alter table t shrink space;
Table altered.


Another peek at USER_EXTENTS shows that the table now consumes 320 blocks in 17 extents. The table has actually shrunk while still online and without a rebuild. It is now half its original size in blocks, because it released extents back to the system—something that was never possible before. Further, look what this shrinking does for a full scan: 

SQL> select count(*) from t;

   COUNT(*)
-----------
     23642
 
Statistics
-------------------------------
          0  db block gets
        409  consistent gets
         62  physical reads


The number of IOs required to perform that operation is now in line with the actual size of the data.


Tom Kyte (thomas.kyte@oracle.com) has worked for Oracle since 1993. Kyte is a vice president in the Oracle Government, Education, and Healthcare group and the author of Effective Oracle by Design (Oracle Press, 2003) and Expert One-on-One: Oracle (Apress, 2003).

Send us your comments