By Tom Kyte
Our technologist searches, numbers, analyzes, and orders.
I have a control table that contains various values: numeric, character, and dates. They are all stored as VARCHARs, but when a value is numeric, can I perform any easy check to verify that a value is a number? Is there a function "isnumeric" that will do this?
I prefer to store dates as dates and numbers as numbers and would probably be looking for a different solution, if this table is going to be of any real size in the future. There are no native "isnumeric"-type functions in Oracle; however, it is very easy to write one. Take the following function, for example:
SQL> create or replace 2 function isnumeric 3 ( p_string in varchar2) 4 return boolean 5 as 6 l_number number; 7 begin 8 l_number := p_string; 9 return TRUE; 10 exception 11 when others then 12 return FALSE; 13 end; 14 / Function created.
This function will return TRUE if the input string can be converted to a number by use of the current session's NLS settings. For dates, you can get a bit fancier and pass in the format to use:
SQL> create or replace 2 function isdate 3 ( p_string in varchar2, 4 p_fmt in varchar2 := null) 5 return boolean 6 as 7 l_date date; 8 begin 9 l_date := 10 to_date(p_string,p_fmt); 11 return TRUE; 12 exception 13 when others then 14 return FALSE; 15 end; 16 / Function created.
Analytics to the Rescue (Again)
I have the following test data:
select * from test_sort; T_NM P_NM ID ---- ---- -- 01t 02P 36 01t 01p 37 03t 02P 38 03t 01p 39 02T 02P 40 02T 01p 41
I would like to run a query that produces data in the following order:
T_NM P_NM ID ---- ---- -- 01t 01p 37 01t 02P 36 03t 01p 39 03t 02P 38 02T 01p 41 02T 02P 40
The sorting is based on the minimum ID by T_NM (36 for every 01t, 38 for every 03t, and 40 for every 02T) and then sorting within that by P_NM. I've tried various subqueries and analytic functions (rank, dense_rank, row_number). It seems so simple, but I can't seem to get it. Any ideas?
The answer here is hidden in the question; you described the sorting as "based on the minimum ID by T_NM ." That is almost the solution in this case. All you need to do is type in this SQL:
SQL> select t_nm, p_nm, id, 2 min(id) 3 over (partition by t_nm) 4 min_id 5 from test_sort 6 order by 4, 2 7 / T_NM P_NM ID MIN_ID ---- ---- -- ------ 01t 01p 37 36 01t 02P 36 36 03t 01p 39 38 03t 02P 38 38 02T 01p 41 40 02T 02P 40 40 6 rows selected.
You can now order by the minimum ID by T_NM and then, within that, by the P_NM column. Analytics rock.
We have a third-party application we use for selecting specific values from several tables. The character values stored in a field can be in any of the following formats: Bell, bell, BELL, beLL, and so on. Is there a way to force (change) values (either to upper- or lowercase) during DML other than by using triggers? And if not, is there a way to force a SELECT query (other than using functions UPPER or LOWER) to return all the values, regardless of case?
In Oracle 10g, there is a way to do this. Prior to Oracle 10g, you had to adopt one of the following strategies:
Use a function-based index on UPPER ( column_name ) and modify the queries to use WHERE UPPER ( column_name ) = value.
Use a trigger to roll the column value to upper- or lowercase upon modification.
Use Oracle Text to create a TEXT index on the column; text indexes (which would then mandate the use of the CONTAINS operator in the predicate) can be case-sensitive or -insensitive.
In each of these cases, one of your conditions would have been violated. You would have needed to use triggers, or UPPER() or CONTAINS in the WHERE clause.
In Oracle 10g, you can do this transparently. Suppose you have the following table of data:
SQL> select * from t; DATA -------------- Hello HeLlO HELLO
And suppose you run the following query:
SQL> variable x varchar2(25) SQL> exec :x := 'hello'; SQL> select * 2 from t 3 where data = :x; no rows selected
Nothing in that table matches "hello" in lowercase. Well, with some ALTER SESSION s, you can change all that:
SQL> alter session 2 set nls_comp=ansi; Session altered. SQL> alter session 2 set nls_sort=binary_ci; Session altered. SQL> select * 2 from t 3 where data = :x; DATA -------------- Hello HeLlO HELLO
This is the same query with the same input data, but with a very different result. The effect of these two ALTER SESSION directives is to enable case- insensitive comparisons for =, <, and >.
You can, in fact, take this one step further and even index the data in a case-insensitive fashion, allowing searches on this data to utilize indexes when it makes sense. Consider the following example:
SQL> create index t_idx on 2 t( nlssort( data, 3 'NLS_SORT=BINARY_CI' ) ); Index created. SQL> begin 2 dbms_stats.set_table_stats 3 (ownname=>user, 4 tabname=>'T', 5 numrows=> 100000); 6 end; 7 /
So, I created a function-based index on the DATA column and used the binary case insensitive sort in doing so. Then I faked out the optimizer and told it that the table has lots of rows. Using AUTOTRACE , you can see that the optimizer can and will use an index to access this information:
SQL> set autotrace on SQL> set linesize 121 SQL> select * 2 from t 3 where data = :x; DATA -------------- Hello HeLlO HELLO Execution Plan ------------------------------ SELECT STATEMENT (Cost=2) TABLE ACCESS (BY INDEX ROWID) INDEX (RANGE SCAN) OF 'T_IDX'
Please explain how to remove duplicate records quickly in a single run from a large table containing about 5 million records. I tried it with the following query, but it takes 10 hours:
delete from test1 where rowid not in (select min(rowid) from test1 group by rc_no);
Even after increasing the rollback segment tablespace to 7GB, we are not getting the desired results.
If you are using the rule-based optimizer (RBO), there is a very good chance that the NOT IN query is being evaluated once for each row in the outer query! So, hopefully, you are using the much smarter cost-based optimizer (CBO).
But in any case, my approach to removing duplicates is a little different. I would generate the set of rowids to delete by using analytics and then delete them, like this:
SQL> select count(*), 2 count(distinct cust_seg_nbr) 3 from t; COUNT(*) COUNT(CUST_SEG_NBR) --------- ------------------- 1871652 756667
Here you can see that I have more than 1.8 million rows but only 756,667 unique ones. I need to delete about 1.1 million of my rows (lots). Let's see how speedy this can be:
delete from t where rowid in (select rid from (select rowid rid, row_number() over (partition by cust_seg_nbr order by rowid) rn from t ) where rn <> 1 ) 1114985 rows deleted. Elapsed: 00:01:46.06 SQL> select count(*), 2 count(distinct cust_seg_nbr) 3 from t; COUNT(*) COUNT(DISTINCTCUST_SEG_NBR) --------- --------------------------- 756667 756667
Now, that was the speed on a laptop computer (so your mileage may vary). Also, in my case, the table was not indexed. Each index is going to add more processing time. Consider disabling your indexes for a mass duplicate removal like this and then rebuilding your indexes afterward. Index maintenance can be very expensive on a large bulk operation in which you plan to delete most of the data.
As for the rollback size, it will necessarily get as big as it needs to be in order to process the delete. Every index you have will make the delete need more space, in addition to making it take more time.
Another very viable option when mass-deleting data is to create a new table that keeps just the right records and drops the old table, as in the following example:
SQL> select count(*), 2 count(distinct cust_seg_nbr) 3 from t; COUNT(*) COUNT(CUST_SEG_NBR) --------- ------------------- 1871652 756667 create table t2 as select cust_seg_nbr, ... from (select t.*, row_number() over (partition by cust_seg_nbr order by rowid) rn from t ) where rn = 1 / Table created. Elapsed: 00:00:10.93
As you can see, it took 11 seconds in this example to copy out the rows to keep, versus 1 minute and 46 seconds to remove the rows you don't want, and you can do this without generating any UNDO (rollback) or REDO to boot.
I have a table with a VARCHAR2 field, and I want to query everything from the table where the VARCHAR2 field has a valid number. For example, if my table has the following data:
name seq ---- --- ab 10 cd 1- ef 1a gh 12 ij 1.0
I want the query result to be
ab 10 gh 12
I reject everything that does not have a number format. I will use the query you give me very often to do updates in that table.
Well, this is very similar to the earlier item on the "isnumeric" function. In fact, the solution is just about the same, but I'll add a function-based index to aid in identifying the rows, using an index when appropriate.
What I'll do is write a very small PL/SQL function that is deterministic (meaning that the function always returns the same answer, given the same inputs). In this case, I'll have it return only two values: either 1 (meaning yes, the input was a number) or NULL (meaning no, the input was not a number).
SQL> create or replace 2 function is_number 3 ( p_string in varchar2 ) 4 return number 5 deterministic 6 as 7 l_num number; 8 begin 9 l_num := p_string; 10 return 1; 11 exception 12 when others then 13 return null; 14 end; 15 / Function created.
Now, I can take any one of your tables with this VARCHAR2 column and issue the following:
SQL> create index t_idx 2 on t(is_number(data));
If you query SELECT * FROM T WHERE IS_NUMBER(data) = 1 , you'll get all of the rows with valid numbers in them via the index, if appropriate. Note that this index will be very compact, because Oracle B*Tree indexes do not create entirely NULL entries, so any row in the table that does not have a number in it will not be represented in the index.
Unless I hard-code the ORDER BY clause in a hard ref cursor, the ORDER BY gets ignored when I'm fetching from the cursor. Can I achieve dynamic ordering on Oracle8i Release 3 (8.1.7) or any higher version? I can use parameters in all other places I've tried except the ORDER BY. Here is a tiny piece of sample code:
open p_cursor for select ename, empno From emp order by decode( variable, '1', 'ENAME', '2', 'EMPNO' )
It is not that the ORDER BY DECODE was ignored; it's that the ORDER BY DECODE you used was akin to
select * from t order by 'hello world';
That is what you are doing in effect—ordering by a constant string. So the ORDER BY was not ignored—it just was not what you wanted to do!
Here are two methods to achieve this. The first uses DECODE ; the other uses dynamic SQL. Let's take a look at DECODE first. I can write
order by decode( l_sort_by, '1', ename, '2', to_char(empno, '0009'))
This DECODE always returns a character string to sort by. If l_sort_by = 1 , it returns ename to sort or returns the EMPNO formatted as a fixed-width string field so it sorts properly. If you have a date, you will use to_char(dt, 'yyyymmddhh24miss') . If you have numbers that can be negative, you'll need to find a format that sorts them correctly. The important thing is to return a character string that sorts properly for you in all cases.
Next, we can use what I regard as the easier method of dynamic SQL or, if you have just a few columns, a static SQL statement opened in an IF/THEN/ELSIF block. First the dynamic SQL approach:
begin l_query := 'select ...'; if (l_sort_by = '1') then l_query := l_query || 'ename'; elsif (l_sort_by = '2') then l_query := l_query || 'empno'; ... end if; open l_cursor for l_query;
The following static SQL example is not too different and would be my choice for a situation with only two or three ORDER BY s (because static SQL is preferable to dynamic SQL, for many reasons):
begin if l_sort_by = '1' then open l_cursor for select ... order by ename; else open l_cursor for select ... order by empno; end if; loop fetch l_cursor into ...; exit when l_cursor%notfound;
You mentioned that Oracle Database 10g reallocates space in certain situations, such as the deleting of every other row in a table. Please elaborate.
Online segment shrink is available for tables in ASSM (Automatic Segment Space Management) tablespaces. Conceptually, what happens is that Oracle reads the table from the bottom up, and upon finding rows at the bottom of the table, it deletes them and reinserts them at the top of the table. When it runs out of space at the top, it stops, leaving all the free space at the end—or bottom—of the table. Then Oracle redraws the high-water mark for that table and releases that allocated space. Here is a quick example:
SQL> create table t 2 ENABLE ROW MOVEMENT 3 as 4 select * 5 from all_objects; Table created.
Here I created a table with ENABLE ROW MOVEMENT . Oracle will be physically moving the rows, and this clause gives Oracle permission to change the rowids. Here's what a full scan of this big table does:
SQL> set autotrace on statistics SQL> select count(*) from t; COUNT(*) ----------- 47266 Statistics ------------------------------- 0 db block gets 724 consistent gets 651 physical reads SQL> set autotrace off
It took 724 logical IOs ( consistent gets ) to read that table and count the rows. A peek at USER_EXTENTS shows the table consuming 768 blocks in 20 extents.
Over time, I perform some deletes on this table, leaving behind lots of white space. I'll simulate that by deleting every other row in the table:
SQL> delete from t 2 where mod(object_id,2) = 0; 23624 rows deleted.
READ more about function-based indexes
SQL> alter table t shrink space compact; Table altered. SQL> alter table t shrink space; Table altered.
Another peek at USER_EXTENTS shows that the table now consumes 320 blocks in 17 extents. The table has actually shrunk while still online and without a rebuild. It is now half its original size in blocks, because it released extents back to the system—something that was never possible before. Further, look what this shrinking does for a full scan:
SQL> select count(*) from t; COUNT(*) ----------- 23642 Statistics ------------------------------- 0 db block gets 409 consistent gets 62 physical reads
The number of IOs required to perform that operation is now in line with the actual size of the data.