TECHNOLOGY: Ask Tom
On Promotion, Restriction, and Data LoadingBy Tom Kyte
Our technologist provides missing links, custom sizes, and external table tips.
Steven Feuerstein posted a URL that contains all of his articles that have been published in Oracle Magazine: oracle.com/technetwork/issue-archive/index-087690.html.
I was wondering if you had a similar link and could post it here.
Yes. Oracle Magazine maintains an archive of past issues and—for recurring columns such as Ask Tom—pages that point to them all. My page of links is oracle.com/technetwork/issue-archive/index-093676.html. You can also go to the Oracle Magazine home page at oracle.com/technetwork/oramag/magazine/home for links to the complete content of back issues as well as links to these list pages—near the bottom of the home page under FEATURED COLUMNS.
Undo Size Restrictions
I have some coworkers who believe that it is a good idea to limit the size of the undo tablespace—in this case, to 1 GB per node in two- to five-node Oracle Real Application Clusters systems. The reasoning is that it controls the size of undo and limits large transactions. (They also do this with TEMP tablespaces). I am not a fan of this practice. What is your opinion?
It seems to be a strange way to limit a resource—and rather arbitrary. To meet the requirement “limit the size of a transaction,” this approach would not even appear on my list of solutions. It doesn’t limit the size of individual transactions—it limits only the cumulative size of all concurrently executing transactions to 1 GB. You could still have a single transaction that is 1 GB in size.
To control the size of transactions, the correct method is to use Oracle Database's Resource Manager feature. Almost all systems should be using Resource Manager already, so if your systems are not, you should read up on it at http://docs.oracle.com/cd/E11882_01/server.112/e25494/dbrm.htm#ADMIN027. With Resource Manager you can, among other things, establish an undo quota for undo space. The undo quota limits the amount of undo a resource consumer group can utilize at any given point in time. In this manner, you can limit the amount of undo an application, group of users, or individual users can generate.
But back to this 1 GB arbitrary limit: Some systems will need 1 GB of undo, others will need 500 MB, others will need 10 GB, and others will need other sizes. The amount of undo needed is based on
In the olden days—before automatic undo management in Oracle9i Database—a DBA had to size undo manually. Nowadays this is how I would size my undo:
If you allow the undo tablespace to autoextend, the database will retain undo for at least as long as you requested in the UNDO_RETENTION setting, and it may retain undo even longer to permit long-running queries to complete.
Temporary tablespace space follows the same sort of rules. For some systems, no temporary space might be used. An online transaction processing (OLTP) system, for example, works with tens of rows at a time, so there is virtually no need for temporary space. Other systems, such as a reporting system or a data warehouse, may need many gigabytes or even terabytes of temporary space, depending on the needs of the queries and the number of concurrently executing queries. The one-size-fits-all approach to undo doesn’t work in real life.
Tricky Data Loading
I am using SQL Loader to load data into tables from a flat file. Some sample data in the file might be:
The field values are empno, empname, phone numbers, and salary. The table structures are
create table emp ( empno number(5) primary key, ename varchar2(10), Sal number(10,2) )
I want to insert the data into the emp and emp_contact tables. I don’t know how many values for phone_no the file contains (the number of values for phone_no is not fixed). How do I insert the data into the tables?
This is tricky, because you have to turn a single row of data into many rows in the EMP_CONTACT table.
With SQL Loader (the twentieth-century legacy data loading tool), this is not going to be directly possible. You could create a view on DUAL and place an INSTEAD OF INSERT trigger on that view to procedurally process the data. The trigger would take each row inserted, parse it row by row, and insert the row(s) into EMP and EMP_CONTACT. I’m not going to suggest that approach, though, because it is very code-intensive and does not scale very well. The trigger approach processes slow by slow (row by row), so it doesn’t perform well for data loading.
I am instead going to suggest using the data loading tool of the twenty-first century: the external table. To test this, I start by creating the external table, as shown in Listing 1, so I can query the data as if it were in a regular database table.
Code Listing 1: Creating the external table
SQL> create or replace directory my_dir as '/home/tkyte' 2 / Directory created. SQL> CREATE TABLE et 2 ( "EMPNO" VARCHAR2(10), 3 "ENAME" VARCHAR2(20), 4 "TELNOS" VARCHAR2(1000), 5 "SAL" VARCHAR2(10) ) 6 ORGANIZATION external 7 ( 8 TYPE oracle_loader 9 DEFAULT DIRECTORY MY_DIR 10 ACCESS PARAMETERS 11 ( 12 RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII 13 BADFILE 'MY_DIR':'t.bad' 14 LOGFILE 't.log_xt' 15 READSIZE 1048576 16 FIELDS TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' LDRTRIM 17 MISSING FIELD VALUES ARE NULL 18 REJECT ROWS WITH ALL NULL FIELDS 19 ( 20 "EMPNO" CHAR(255) 21 TERMINATED BY "," OPTIONALLY ENCLOSED BY '"', 22 "ENAME" CHAR(255) 23 TERMINATED BY "," OPTIONALLY ENCLOSED BY '"', 24 "TELNOS" CHAR(1000) 25 TERMINATED BY "," OPTIONALLY ENCLOSED BY '"', 26 "SAL" CHAR(255) 27 TERMINATED BY "," OPTIONALLY ENCLOSED BY '"' 28 ) 29 ) 30 location 31 ( 32 't.dat' 33 ) 34 ) 35 / Table created. SQL> select * from et; EMPNO ENAME TELNOS SAL ————— ————— ————————————————————————————————————————————— ———— 12 smith 1234556@@1234567@@876556612 1200 14 John 1234@@4567@@56789@@12345@@45778@@34566@@23456 1345
So now I can access the data to be loaded with SQL, and once I have that ability, I can do anything I want to it. What I need to do is take each of the rows in Listing 1, parse the TELNOS string around the string ‘@@’, and output a row for each telephone number. Using SQL, I can do that easily, as shown in Listing 2.
Code Listing 2: Querying the external table to output a row for each telephone number
SQL> select empno, ename, sal, i, 2 substr( tnos, 3 instr( tnos, '@@', 1, i )+2, 4 instr( tnos, '@@', 1, i+1 ) 5 -instr( tnos, '@@', 1, i) - 2 ) tno 6 from ( 7 select to_number(et.empno) empno, 8 et.ename, 9 to_number(et.sal) sal, 10 column_value i, 11 '@@'||et.telnos||'@@' tnos 12 from et, 13 table( cast( multiset( 14 select level 15 from dual 16 connect by level <= 17 (length(et.telnos) 18 -length(replace(et.telnos,'@@','')))/2+1 ) 19 as sys.odciNumberList ) ) 20 ) 21 / EMPNO ENAME SAL I TNO ————— ————— ———— —— ——————— 12 smith 1200 1 1234556 12 smith 1200 2 1234567 12 smith 1200 3 876556612 14 John 1345 1 1234 14 John 1345 2 4567 14 John 1345 3 56789 14 John 1345 4 12345 14 John 1345 5 45778 14 John 1345 6 34566 14 John 1345 7 23456 10 rows selected.
On lines 13–19, I generate the set of rows I need for each input row: three rows for the first input record and seven rows for the second. Then I join this set of data to the external table itself—row by row. No join condition is necessary here when I use the construct
SELECT * FROM some_table, TABLE( some_set )
The TABLE() clause executes for each row in SOME_TABLE, and those rows are joined to the row that caused them to be generated. As a result, lines 12–20 have the effect of returning the first row from the ET external table three times and the second row seven times—exactly what I need. On lines 7–11, I select and format the columns I need from the output. I am converting the EMPNO and SAL columns into their proper datatypes, preserving the ENAME field as is, and adding ‘@@’ to the front and back of the TELNOS string, which will make it easier to parse in a moment. Also, I take the output of my table function (the column is implicitly named COLUMN_VALUE) and name it I. This represents the number of that row by each input record—as you can see, it runs from 1 to 3 for the first record and 1 to 7 for the second.
Last, on lines 1–5, I select the fields I need and parse the nth telephone number from the TNOS string. I use INSTR on line 3 to find the nth occurrence of ‘@@’ in the string and then use INSTR again to discover where the (n+1)th occurrence is on line 4. This enables me to compute the starting character and length of the nth telephone entry in the string, and I get the desired output.
Now in order to load the data, I need to take the first row (where I = 1) for each group in Listing 2, insert it into the EMP table, and take all the rows in each group and insert them into the EMP_CONTACT table. I could do that in two passes through the data—that is, just load the ET table into EMP and then load the output of my Listing 2 query into EMP_CONTACT. Rather than process the data twice, however, I’ll use a multitable insert to load both tables in one statement, as shown in Listing 3.
Code Listing 3: Multitable INSERT to load rows into both the emp and emp_contact tables
SQL> create table emp 2 ( empno number primary key, 3 ename varchar2(10), 4 sal number 5 ); Table created. SQL> create table emp_contact 2 ( empno number references emp, 3 phone_no number 4 ); Table created. \ SQL> insert all 2 when (i = 1) then into emp (empno,ename,sal) values (empno,ename,sal) 3 when (i > 0) then into emp_contact(empno,phone_no) values (empno,tno) 4 select empno, ename, sal, i, 5 substr( tnos, 6 instr( tnos, '@@', 1, i )+2, 7 instr( tnos, '@@', 1, i+1 ) 8 -instr( tnos, '@@', 1, i) - 2 ) tno 9 from ( 10 select to_number(et.empno) empno, 11 et.ename, 12 to_number(et.sal) sal, 13 column_value i, 14 '@@'||et.telnos||'@@' tnos 15 from et, 16 table( cast( multiset( 17 select level 18 from dual 19 connect by level <= 20 (length(et.telnos) 21 -length(replace(et.telnos,'@@','')))/2+1 ) 22 as sys.odciNumberList ) ) 23 ) 24 / 12 rows created. SQL> select * from emp; EMPNO ENAME SAL ————— —————— ————— 12 smith 1200 14 John 1345 SQL> select * from emp_contact; EMPNO PHONE_NO ——————— ————————— 12 1234556 12 1234567 12 876556612 14 1234 14 4567 14 56789 14 12345 14 45778 14 34566 14 23456 10 rows selected.
As you can see, I simultaneously loaded the EMP and the EMP_CONTACT tables, using a single SQL statement against the ET external table. Using external tables as your data loading tool offers up a world of possibilities that was just not possible with SQL Loader.
I have a question about joining with collections and cardinality estimation. The optimizer is always estimating that 8,168 rows are coming back from my collection, and because of that, it is using inefficient plans. The estimate of 8,168 is more than two orders of magnitude more than the real cardinality. How can I solve this problem?
This is a long-running issue with pipelined functions and collections during optimization. The optimizer in general doesn’t have any information about the cardinality (the number of rows) being returned by the collection. It has to guess—and that guess is based on the block size (default statistics are block-size-driven). So, for a database with an 8 K block size, the guess is about 8,000. And because it is unlikely that your collection has about 8,000 elements (probably more like 8 or 80 in most cases), you can definitely arrive at a suboptimal plan.
So the question is how do you get the right estimated cardinality value to the optimizer? I’m going to demonstrate four ways here—and I’ll provide a pointer to a fifth way that uses the extensible optimizer. The four methods I’ll demonstrate are
You can read about the fifth method in an excellent write-up by Oracle ACE Adrian Billington at bit.ly/WRhizp.
To demonstrate how to get and work with the right estimated cardinality value, I’m going to use a little pipelined function I’ve used many times over the years. This function takes a delimited string and turns it into a set, as shown in Listing 4.
Code Listing 4: Pipelined function turning delimited string into set
SQL> create or replace type str2tblType as table of varchar2(30) 2 / Type created. SQL> create or replace 2 function str2tbl( p_str in varchar2, p_delim in varchar2 default ',' ) 3 return str2tblType 4 PIPELINED 5 as 6 l_str long default p_str || p_delim; 7 l_n number; 8 begin 9 loop 10 l_n := instr( l_str, p_delim ); 11 exit when (nvl(l_n,0) = 0); 12 pipe row( ltrim(rtrim(substr(l_str,1,l_n-1))) ); 13 l_str := substr( l_str, l_n+1 ); 14 end loop; 15 end; 16 / Function created.
Once I have that function installed, I can try it out by executing a query like this:
SQL> variable x varchar2(15) SQL> exec :x := '1,2,3,a,b,c' PL/SQL procedure successfully completed. SQL> select * from table(str2tbl(:x)); COLUMN_VALUE —————————————————————————————————————— 1 2 3 a b c 6 rows selected.
Code Listing 5: Optimizer making 8 K cardinality estimate
SQL> select * from table(dbms_xplan.display_cursor); PLAN_TABLE_OUTPUT ———————————————————————————————————————————————————————— SQL_ID ddk1tv9s5pzq5, child number 0 ———————————————————————————————————————————————————————— select * from table(str2tbl(:x)) Plan hash value: 2407808827 ——————————————————————————————————————————————————————————————————————————— |Id|Operation |Name |Rows|Bytes|Cost (%CPU)|Time | ——————————————————————————————————————————————————————————————————————————— | 0|SELECT STATEMENT | | | | 29 (100)| | | 1| COLLECTION ITERATOR PICKLER...|STR2TBL|8168|16336| 29 (0)|00:00:01|
Code Listing 6: Using the CARDINALITY hint
SQL> select * from table(dbms_xplan.display_cursor); PLAN_TABLE_OUTPUT ———————————————————————————————————————————————————————— SQL_ID bd2f8rh30z3ww, child number 0 ———————————————————————————————————————————————————————— select /*+ cardinality(sq 10) */ * from table(str2tbl(:x)) sq Plan hash value: 2407808827 ——————————————————————————————————————————————————————————————————————————— |Id|Operation |Name |Rows|Bytes|Cost (%CPU)|Time | ——————————————————————————————————————————————————————————————————————————— | 0|SELECT STATEMENT | | | | 29 (100)| | | 1| COLLECTION ITERATOR PICKLER...|STR2TBL| 10| 20| 29 (0)|00:00:01|
The next approach I can take in Oracle Database 10g and later is the same approach a SQL profile would take: use the OPT_ESTIMATE hint. The OPT_ESTIMATE hint in this case will take three inputs: the type of thing to apply the hint to (a table in this case), the correlation name of that table (SQ in this case), and a scaling factor to apply to the optimizer’s estimate. This scaling factor is a number by which the optimizer will multiply its estimated cardinality to arrive at the desired cardinality. In this case, the scaling factor will be
SQL> select 10/8168 from dual; 10/8168 ———————————————— .00122429
I want 10 instead of 8,168, so I need to use a scaling factor of 0.00122429. When I do that, I get the result in Listing 7.
Code Listing 7: Using the OPT_ESTIMATE hint
select /*+ opt_estimate(table, sq, scale_rows=0.00122429) */ * from table(str2tbl(:x)) sq Plan hash value: 2407808827 ——————————————————————————————————————————————————————————————————————————— |Id|Operation |Name |Rows|Bytes|Cost (%CPU)|Time | ——————————————————————————————————————————————————————————————————————————— | 0|SELECT STATEMENT | | | | 29 (100)| | | 1| OLLECTION ITERATOR PICKLER...|STR2TBL| 10| 20| 29 (0)|00:00:01|
The third approach I can use is dynamic sampling in Oracle Database 11g Release 1 and later. In order for a collection to be sampled, I must add a dynamic sampling hint to the query. Having the dynamic sampling level set in the session or at the system level is insufficient. Listing 8 demonstrates the query with the dynamic sampling hint.
Code Listing 8: Using the DYNAMIC SAMPLING hint
select /*+ dynamic_sampling( sq, 2 ) */ * from table( str2tbl(:x,',') ) sq Plan hash value: 2407808827 ——————————————————————————————————————————————————————————————————————————— |Id|Operation |Name |Rows|Bytes|Cost (%CPU)|Time | ——————————————————————————————————————————————————————————————————————————— | 0|SELECT STATEMENT | | | | 11 (100)| | | 1| COLLECTION ITERATOR PICKLER...|STR2TBL| 6| 12| 11 (0)|00:00:01| ——————————————————————————————————————————————————————————————————————————— Note ——————— dynamic sampling used for this statement (level=2)
Last, Oracle Database 11g Release 2 and later include the Cardinality Feedback feature. I quickly demonstrated this once before (bit.ly/Y7WSjM). Since I wrote that article, a new requirement for Cardinality Feedback to work with collections and bind variables has arisen, and I’ll address that here.
Cardinality Feedback works by having the optimizer change its cardinality estimates after executing a query for the first time and observing that the actual cardinalities were very far off from the estimated cardinalities. That is, the optimizer starts to learn from its mistakes. If it executes a query and discovers that the real row counts are far off from the estimated counts, it will reoptimize the query, using the newly discovered values.
For this to work with collections and bind variables, you have to modify our query slightly. I’ll be using a WITH factored subquery and the materialize hint to have the database physically materialize the collection data into a temporary table. (This is required for Cardinality Feedback to work with bind variables in this case.) Listing 9 demonstrates Cardinality Feedback using the WITH factored subquery and the materialize hint.
Code Listing 9: Using Cardinality Feedback (and the WITH factored subquery and materialize hint)
with sq as ( select /*+ materialize */ * from table( str2tbl( :x ) ) ) select * from sq Plan hash value: 630596523 ————————————————————————————————————————————————————————————————————————————— |Id|Operation |Name |Rows|Bytes|Cost (%CPU)|Time | ————————————————————————————————————————————————————————————————————————————— | 0|SELECT STATEMENT | | | | 32 (100)| | | 1| TEMP TABLE TRANSFORMATION | | | | | | | 2| LOAD AS SELECT | | | | | | | 3| COLLECTION ITERATOR PICKLER...|STR2TBL|8168|16336| 29 (0)|00:00:01| | 4| VIEW | |8168| 135K| 3 (0)|00:00:01| | 5| TABLE ACCESS FULL |SYS_...|8168|16336| 3 (0)|00:00:01| ————————————————————————————————————————————————————————————————————————————— 18 rows selected.
With Cardinality Feedback, I would just use SQ wherever I would have used the actual query. Note that the first time I execute this query, the cardinality estimate is way off—it is that magic number 8,168 (again). However, the optimizer learns from its mistake, and when I execute the query again, I get the result in Listing 10.
Code Listing 11: Getting the corrected cardinality with Cardinality Feedback
with sq as (select /*+ materialize */ * from table( str2tbl( :x ) ) ) select * from sq Plan hash value: 630596523 ————————————————————————————————————————————————————————————————————————————— |Id|Operation |Name |Rows|Bytes|Cost (%CPU)|Time | ————————————————————————————————————————————————————————————————————————————— | 0|SELECT STATEMENT | | | 32 (100)| | | 1| TEMP TABLE TRANSFORMATION | | | | | | 2| LOAD AS SELECT | | | | | | 3| COLLECTION ITERATOR PICKLER...|STR2TBL|8168|16336| 29 (0)|00:00:01| | 4| VIEW | 6| 102 | 3 (0)|00:00:01| | 5| TABLE ACCESS FULL |SYS_...| 6| 12| 3 (0)|00:00:01| ————————————————————————————————————————————————————————————————————————————— Note ——————— - cardinality feedback used for this statement 22 rows selected.
With Cardinality Feedback, I get the correct estimated cardinality. I have to make the mistake once, but the optimizer will self-correct the second time around. So, there are four of the methods for getting the estimated cardinality for collections correct: the cardinality hint, the OPT_ESTIMATE hint, dynamic sampling, and Cardinality Feedback.
Tom Kyte is a database evangelist in Oracle’s Server Technologies division and has worked for Oracle since 1993. He is the author of Expert Oracle Database Architecture (Apress, 2005, 2010) and Effective Oracle by Design (Oracle Press, 2003), among other books.
Send us your comments