by Tom Kyte
Our Oracle technologist shows how to use the unused index and how to play.
I have a table called T that currently contains about 7,500 rows. I am running Oracle8i Release 3 (8.1.7) using the rule-based optimizer. The table looks like this:
CREATE TABLE T ( id varchar2(12), ..... ); CREATE UNIQUE INDEX t_idx ON T ( id );
When I try to run the simple query below, it uses a full-table scan. Why?
select * from T where id = 650474;
First, I have one suggestion: In Oracle8 i Release 8.1.7, use the cost-based optimizer. You paid for a lot of features you won't be able to use without it. The rule-based optimizer is for the support of legacy environments and existing applications.
My second suggestion is that if your columns contain numbers, use the NUMBER datatype. Always use the correct datatypes.
In my book Expert One-on-One: Oracle (Wrox Press, 2001 www.wrox.com), I have a chapter on indexes. One of the sections in that chapter is titled "Why isn't my index getting used?" I have a list of common reasons there, and you are, quite simply, case number 4.
Case 4. You have indexed a character column. This column contains only numeric data. You query using the following syntax: select * from t where indexed_column = 5. Note that the number 5 in the query is a literal number 5 (not a character-string literal). The index on indexed_column is not used in this query, because the query is the same as select * from t where to_number(indexed_column) = 5. You have implicitly applied a conversion function to the column, which nullifies the use of the index. This is very easy to see with a small example, shown in Listing .
You should always avoid implicit conversions. Always compare apples to apples and oranges to oranges. Another case where this comes up frequently is with dates. You try a query to find all records for today:
select * from t where trunc(date_col) = trunc(sysdate);
You discover that the index on DATE_COL will not be used. You can either index the TRUNC of DATE_COL or query using BETWEEN . The following demonstrates the use of BETWEEN on a date. Once you realize that trunc(date_col) = trunc(sysdate) is the same as date_col between sysdate and sysdate PLUS one day , doing the BETWEEN is straightforward:
select * from t where date_col between trunc(sysdate) and trunc(sysdate)+1 and date_col <> trunc(sysdate)+1;
You need the extra predicate date_ col <> trunc(sysdate)+1 because BETWEEN is inclusive, and you need to remove the rows that happened on midnight of that date.
Note that another popular approach is to subtract one second from:
trunc (sysdate)+1: where date_col between trunc(sysdate) and trunc(sysdate)+ 1-1/(24*60*60)
In light of Oracle9 i Database support for the TIMESTAMP datatype, with times that are more precise than one second, the latter approach is dangerous.
Using the BETWEEN clause moves all of the functions to the right-hand side of the equation, allowing you to use the index on DATE_COL . (This has the same effect as where trunc(date_col) = trunc(sysdate) . If possible, you should always remove the functions from database columns when they are in the predicate. Not only will this allow for more indexes to be considered for use, it will also reduce the amount of processing the database needs to do. In the above case, when we used where date_col between trunc(sysdate) and trunc(sysdate)+1 and date_col <> trunc(sysdate)+1; , the values are computed once for the query, and then an index could be used to find only the qualifying values. When we used trunc(date_col) = trunc (sysdate) , the trunc(date_col) had to be evaluated once per row for every row in the entire table.
Why is V$OPEN_CURSOR.SQL_TEXT showing things like "table_4_xxxx_x_x_x_ x"? Most of the entries in V$OPEN_CURSOR look like this. My database clients are JDBC thin drivers against Oracle8i Release 8.1.7 on Sun Solaris 8. For example:
select user_name, count(1) as count, sql_text from sys.v_$open_cursor group by user_name, sql_text order by count; USER_NAME COUNT SQL_TEXT --------- ----- --------------------- ORACLE 1 select user_name, ... SGMASTER 1 SELECT PROPERTY_N... SGMASTER 5 INSERT INTO ... SGMASTER 7 table_4_200_5ee2_0_0_0 SGMASTER 12 table_4_2000_5ed8_0... SGMASTER 12 table_4_200_5ed8_0_0_0 SGMASTER 12 table_e_400_5ed8_2_0_0
These entries are connected to implicit cursors surrounding LOBs. When you have a LOB open for reading/writing, you will see one of these mysterious entries in V$OPEN_ CURSOR . You can see them with nested tables as well (when you create or access them).
In this case, it looks as if all of your entries are LOB cursors. The entries with TABLE_4_* indicate LOBs that are opened for reading, while the TABLE_E_* entries indicate LOBs that are opened for writing. Consider the example in Listing 2 for an LOB opened for reading, and the example in Listing 3 for an LOB opened for writing.
As you can see by the WRITE mode example in Listing 3, the TABLE_E entry appears. Interestingly, the number in the middle of the SQL_TEXT (9912, in this case) is actually the OBJECT_ID of the table containing the LOB in hex:
SQL> select object_name, object_type 2 from user_objects 3 where object_id = to_number('9912', 'XXXX'); OBJECT_NAME OBJECT_TYPE ----------- ----------- T TABLE
So you can use these entries not only to see the open LOBs but also to determine what tables they come from.
A query returns a result set in a format as follows:
ID LINK CNT _____ ____ ___ 40032 32 1 40033 43 1 40034 16 6 40034 22 6 40034 28 2 ...
There are several hundred thousand rows in the result set. The desired result: For each ID, return the MAX(CNT) associated with the ID, along with a related LINK. If an ID has two LINKs with the same MAX(CNT) (for example, 40034), it does not matter which LINK is used for the final result.
So, for this example, an acceptable result for the query would be:
ID LINK CNT _____ ____ ___ 40032 32 1 40033 43 1 40034 16 6 ...
Returning a LINK along with the MAX(CNT) is causing me some grief. Could you offer a suggestion?
Sure. The question you raise is asked very frequently: how to aggregate and get some related (but not grouped-by) data along with the results. In this case, you have MAX(CNT) and GROUP BY ID , but you also want to get a related LINK .
A trick I like to use for this situation looks like this:
select id, to_number(substr(data, 1, 9)) cnt, to_number(substr(data, 10)) link from ( select id, max( to_char(cnt,'fm000000009') || link ) data from t group by id );
The trick here is to encode both the column on which you want to perform the aggregate operation (CNT) and the related data. You must do the encoding in such a fashion that the aggregate function—such as MAX —still works. In this example, assuming CNT is a natural number (0, 1, 2, ... and so on) up to nine digits in length, you'll get what you want. The MAX will take place on the first nine characters (zero padded, according to our format). All you have to do afterward is SUBSTR out the pieces of information in which you are interested. It might make more sense to see this in three steps, shown in Listing 4, so you can see what is going on.
As you can see in the final result in Listing 4, the GROUP BY ID has reduced the result set down to just three rows, one for each unique ID . The column that is left contains the MAX(CNT) for each ID . In the case where the CNT column was repeated (for ID 40034), you got the MAX(CNT || LINK) . Now all you need to do is SUBSTR out the relevant pieces of information from DATA , as shown in Listing 5.
Interestingly, this result set is different in the second query in Listing 5; the LINK is 16 in this case, but it is just as "correct" as the previous query. You just found the first LINK for a given ID/CNT tuple using ROWNUM .
Another approach makes use of the analytic functions feature, available in Oracle8 i Release 2 and later, as shown in Listing 6.
You partition the result set by ID , and for each ID group, you order the rows by CNT DESC and pick off the FIRST_VALUE you see. As you continue to build the query in Listing 6 , use the ROW_NUMBER function to assign an increasing number to each row in the partitions. By keeping just the first row from each partition, you get exactly what you want: the MAX(CNT) and some LINK . This method is superior to the first method I described because it does not require the "encoding" trick. Remember, however, that analytic functions are available with Oracle8 i Enterprise Edition Release 2 and later only.
I was executing the following SQL statement with the table in nologging mode, but it seems the changes are logged (and not just the dictionary changes):
SQL> insert /*+ append */ into testusr.contactversion nologging select * from perm_owner.contactversion; 297585 rows created. Statistics ----------------------------- 12975 recursive calls 1798143 db block gets 56538 consistent gets 9882 physical reads 125409912 redo size
It does not matter if the table is set to LOGGING or NOLOGGING; the same amount of REDO is generated.
The answer in this case is simple: You have indexes on the table. Indexes cannot be appended; they must be modified. And these modifications must be logged. The redo/undo for the table itself is not generated, but the redo/undo for the indexes always is, as the following demonstrates:
SQL> create table t nologging as select* from all_objects where 1=0; Table created. SQL> set autotrace on statistics; SQL> insert /*+ append */ into t select * from all_objects; 34855 rows created. Statistics ------------------------------ 151 recursive calls 116 db block gets 474434 consistent gets 2173 physical reads 10208 redo size ... SQL> truncate table t; Table truncated. SQL> create index object_name_idx on t(object_name); Index created. SQL> insert /*+ append */ into t select * from all_objects; 34856 rows created. Statistics ------------------------------- 205 recursive calls 3424 db block gets 474764 consistent gets 2465 physical reads 4963388 redo size ...
With an index present, there is significantly more REDO generated. The REDO for the first insert was the redo necessary to protect the data dictionary (because we advanced the high-water mark for the table). The redo for the second insert was to protect the data dictionary and the index structure, so the second redo size statistic reflects the information that enables the redo for the index changes.
If possible, before you do a large direct-path insert like that, disable your indexes, do the load, and then rebuild them in parallel with the NOLOGGING option. In that fashion, you'll end up with a nicely compacted index and the best load times end-to-end.
Columnist Tom Kyte (email@example.com) has worked for Oracle since 1993. Kyte is a vice president in the Oracle Government, Education, and Health group the author of Expert One-on-One: Oracle, and the coauthor of Beginning Oracle Programming (Both published by Wrox Press, www.wrox.com).