As Published In
Oracle Magazine
May/June 2013

TECHNOLOGY: Ask Tom

  

On Efficiency, Avoidance, and Performance

By Tom Kyte Oracle Employee ACE

 

Our technologist uses brutally efficient DDL, answers trick questions, and suggests a scalar subquery.

I have a table that contains millions of records. I want to update and commit for a certain number of records—say 10,000—but I don’t want to do it in one stroke, because I may end up with rollback segment issues. Do you have any suggestions on how best to do this?

I get this question often, and the answer is: do not use UPDATE (or DELETE if you are purging instead of updating). UPDATE and DELETE are great in online transaction processing (OLTP) systems in which you are modifying a few records (such as 1 or 10 at a time). UPDATE and DELETE are not for massive operations—the UNDO they must generate and the REDO that is generated as a result are overhead you don’t need for a massive batch operation.

This is a good case for data definition language (DDL)—not data manipulation language (DML). You will bypass all UNDO generation with DDL and have the opportunity, if appropriate, to skip REDO generation as well (by using NOLOGGING). At the very least, you will minimize the amount of REDO you generate. Additionally, you will bypass the inefficient buffer cache during this massive operation. Yes, I said the inefficient buffer cache.

Think about the math involved in updating millions of rows on, for example, 1 million blocks (about 7.5 GB of table data). You will have to read each and every block into the buffer cache (I’ll assume a full table scan in this case). So, you’ve read the entire table at least once—1 million blocks—and then for every row you modify, you’ll have to put the “before image” into an UNDO segment and modify (dirty) the table block in the buffer cache. So you’ll write into the UNDO segment for each and every row with the additional bookkeeping information you track in the UNDO segment, meaning that you’ll probably put twice as many or more UNDO blocks into the UNDO segment as you have table blocks. The result: you’ll write 3 million blocks into the buffer cache, one at a time—1 million table blocks and 2 million UNDO blocks. That will probably exceed the buffer cache for most people, and even if it doesn’t, things such as checkpoints and other sessions needing free blocks in the cache will cause those blocks to be written to disk. At this point, you’ve done 1 million block reads and 3 million block writes, and that doesn’t even count the REDO you’ve generated. The REDO for this operation would be huge, because it protects the table blocks and the UNDO block you’ve generated. And all of that has to be written to disk before the commit as well.

Now compare that with using a CREATE TABLE AS SELECT statement to retrieve and write out the rows as they would be modified. Instead of updating the rows, you write a query that selects the modified rows and puts them into a new table. You can skip the UNDO and, optionally, the REDO generation entirely. You bypass the buffer cache for the modifications—they go straight to disk—and there’s no competition with other sessions for blocks in the cache. You perform significantly less I/O overall and use less CPU time (much less, typically), because you don’t have to maintain the data structure that is the buffer cache. You’ll then index and add any constraints to your new table, drop the old table, and rename this new table.

Even if you were updating or deleting a small percentage of a table, this DDL approach often makes sense. Suppose you were to update 1 percent of a 100-million-row table. And suppose this table stored about 100 rows per block and that the rows you needed to update were more or less uniformly distributed throughout the table. You would end up reading the entire table anyway—because every block needs about 1 row updated in it—generating all the UNDO and flooding the buffer cache again with dirty blocks that need to be written to disk one by one. Some might say, “But because you are touching just 1 percent of the rows, you’d naturally want to use an index,” but that wouldn’t be true either. You’d need to read almost every block in the table anyway, so doing a full table scan would be the right approach. Retrieving millions of rows via an index when the data is spread all over the place is not feasible in the real world. A quick look at the math will bear that out again. A typical single block I/O takes about 5 milliseconds on average. If you have 1 million blocks to read, that means 5 million milliseconds or 5,000 seconds—or almost one and a half hours! If you did those I/Os with full table scans instead, you would be able to read that data in a few seconds. Even assuming a slow transfer rate of 100 MB/sec, you would be able to read that data in about 75 seconds.

So, rather than read and write every block of the table via the buffer cache and generate large amounts of unnecessary UNDO and possibly REDO as well, use what I call “brutally efficient” DDL to perform these operations.

To view some testaments to this approach, check out bit.ly/WDML5Y. “How to Update Millions of Records in a Table” is a question that was first asked more than a decade ago and whose answer has grown over the years, the last update to it being just a few days ago.

 

Which Index Is Better in a Partitioned Table: Global or Local?

I have a partitioned table based on a date, say startdate, and interval partitions for each day. I use a query that generates a report based on days (such as a report for the previous five days). Also I use queries that generate a report based on hours (such as a report for the previous five hours). There are queries that access data within a partition and across partitions as well. Please advise whether I should use a global or a local index on startdate.

This is a trick question. It sounds like, in most cases, you do not want any indexes! If you are going to cross partitions—hitting five days’ worth of data—hopefully you will not be using an index at all. Ideally you will be using a full table scan of the five partitions because you are hitting every row anyway. One of the reasons to partition data is to avoid inefficient indexes. Indexes are good for getting a few rows from a table, but once you start retrieving numerous rows, indexes become terribly inefficient, as the math I demonstrated in the previous answer shows.

If all of your queries include startdate and other columns in the predicate, and you think you’ll typically hit a few partitions—a handful (say fewer than five at the most)—you’ll probably want to employ locally partitioned indexes for almost all of your indexes.

Additionally, startdate doesn’t need to be in all these indexes—they do not need to be prefixed with startdate. (You might want an index that starts with startdate, but only when you are querying the previous N hours.)

For example, suppose you have queries like this:

 

select .... 
  from t
 where startdate 
   between sysdate and sysdate-5
   and x > 100;

select .... 
  from t
 where startdate 
   between sysdate and sysdate-2
   and x > 100;

 

For these queries, it might make sense to have a locally partitioned index on x—and just on x. You know that the first query will hit just five partitions, because you are using a daily partitioning scheme. If x > 100 returns a very small number of rows from each of those five partitions, an index on x—and just on x—will be appropriate. The query will do five index range scans (which is acceptable) to find a few rows. The predicate on startdate will indicate which index partitions to range-scan and then range-scan each of the five for x > 100. The second query will do just two index range scans (which is, again, acceptable).

You will want a globally partitioned index on x if you have queries such as 

select .... 
  from t
 where startdate 
   between sysdate and sysdate-50
   and x > 100;

select .... 
  from t
 where x > 100;

 

This assumes again that x > 100 returns a small number of rows from the candidate set of data. You’d want a global index on x in this case, because doing 50 index range scans is becoming unacceptable—it is too much work to range-scan 50 times. And for the second query, you’d do N range scans, where N was the number of partitions in the table, which again is probably unacceptable.

Finally, if you have queries such as 

select .... 
  from t
 where startdate 
   between sysdate and sysdate-5;

select .... 
  from t
 where startdate 
   between sysdate and sysdate-2;

 

you won’t want any indexes. Do a full table scan of the partitions, and be done with it. An index is useful for retrieving a few rows from a large set of data, but if you are looking at every row in a set of data, indexes are something to be avoided.

If you query the last five hours of data, again, you probably don’t want any indexes either. At the very beginning of the day, you’ll be returning about 20 percent of yesterday’s data—too much data for an index to retrieve. Early in the day, you’ll be returning every row from the current partition, so an index won’t make sense here either. In the middle of the day, at noon, you’ll be returning almost 50 percent of the data in that partition. At the end of the day, you’ll be returning about 20 percent of the data. That is too much of the data to be using an index—a nice full table scan would be best.

You’ll want an index on startdate only if you run a query that returns a few rows from a large number of rows. For example, if you ask for a five-minute window of data, indexing startdate will make sense; otherwise, it probably won’t.

Temporary Space Usage

DBAs preach to developers that they shouldn’t implement their own fancy joins on the client side but instead leave joining to the database server, because in most cases the database will do a much better job. Yet Oracle Database seems to expand all data in the TEMP tablespace when processing it, which sometimes results in unnecessary sorts on disks and is incredibly slow. As far as I know, there are algorithms that minimize the size of the memory for sorting (working similarly to pointers), and for a long time I thought Oracle Database would implement some sort of TEMP optimization. (I know about the optimization done by Oracle Net Services, and I can easily verify that with a network sniffer.)

This problem with TEMP can be demonstrated with a detail table that contains many rather small records joining just a few records from a very “fat” master table. If I go after a single fat master record that is joined to many thousands of “skinny” detail records and order it by some attributes of only the detail records, the database will place the joined master/detail record into TEMP for sorting—resulting in thousands of fat records—and then sort it. In this special case, I think it would be more efficient to sort the detail records and then pick up the master record.

Is there an Oracle Database solution to this? One that would let me sort the detail records and then pick up the associated master record so as to not spill into TEMP?

I’m not a DBA, and I preach to developers to not join on the client side as well. But there is an approach you can take: use scalar subqueries.

First, let me comment on the statement “I know about the optimization done by Oracle Net Services, and I can easily verify that with a network sniffer.”

This is something I demonstrated in the January/February 2012 issue of Oracle Magazine; you can read about that optimization in the “Oracle Net Services compression” subsection.

Now, on to the question at hand. I’ll start by setting up a master/detail relationship that can be used to demonstrate the issue. I’ll start by creating the master table, as shown in Listing 1.

Code Listing 1: Creating the wide master table 

SQL> create table master
  2  (
  3  id  number primary key,
  4  c1  varchar2(4000) default rpad('*',4000,'*'),
  5  c2  varchar2(4000) default rpad('*',4000,'*'),
  6  c3  varchar2(4000) default rpad('*',4000,'*'),
  7  c4  varchar2(4000) default rpad('*',4000,'*'),
  8  c5  varchar2(4000) default rpad('*',4000,'*'),
  9  c6  varchar2(4000) default rpad('*',4000,'*'),
 10  c7  varchar2(4000) default rpad('*',4000,'*'),
 11  c8  varchar2(4000) default rpad('*',4000,'*'),
 12  c9  varchar2(4000) default rpad('*',4000,'*')
 13  );
Table created.

 

As you can see, the master table is quite wide—each row is about 36 K in size. Now I’ll define the detail table associated with this master: 

SQL> create table detail
  2  (
  3  id         number  primary key,
  4  master_id  number references master,
  5  data       varchar2(40)
  6  );
Table created.

 

This table is quite narrow—about 40 bytes in width—orders of magnitude smaller than the master table.

Last, I’ll populate each table with some data. I am going to place only one row in the master table, because what I am investigating is the temporary space utilization for a query that would return one or only a few master records and thousands of related detailed records. The code in Listing 2 populates both tables.

Code Listing 2: Inserting values into the master and detail tables 

SQL> insert into master( id ) values ( 1 );
1 row created.

SQL> insert into detail (id, master_id, data )
  2  select rownum, 1, object_name
  3    from all_objects;
72900 rows created.

SQL> exec dbms_stats.gather_table_stats( user, 'MASTER', cascade=>true );
PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats( user, 'DETAIL', cascade=>true );
PL/SQL procedure successfully completed.

 

And now I can demonstrate the TEMP space issue by joining the master table to the detail table for the one master record and sorting by some attribute of the detail table, as shown in Listing 3.

Code Listing 3: Joining the master and detail tables and sorting by a detail attribute 

SQL> set autotrace traceonly explain

SQL> select *
  2    from master m ,detail d
  3   where m.id = d.master_id
  4     and m.id = 1
  5   order by d.data;

Execution Plan
————————————————————————————————————
Plan hash value: 3160665378

———————————————————————————————————————————————————————————————————————————————
|Id|Operation                  |Name  |Rows |Bytes |TempSpc|Cost(%CPU)|Time    |
———————————————————————————————————————————————————————————————————————————————
| 0|SELECT STATEMENT           |      |72900| 2505M|       | 546K  (1)|01:49:15|
| 1| SORT ORDER BY             |      |72900| 2505M|   569M| 546K  (1)|01:49:15|
| 2|  NESTED LOOPS             |      |72900| 2505M|       | 138   (1)|00:00:02|
| 3|   TABLE ACCESS BY INDEX...|MASTER|    1|36012 |       |   1   (0)|00:00:01|
|*4|    INDEX UNIQUE SCAN      | ...  |    1|      |       |   0   (0)|00:00:01|
|*5|   TABLE ACCESS FULL       |DETAIL|72900| 2278K|       | 137   (1)|00:00:02|
———————————————————————————————————————————————————————————————————————————————

Predicate Information (identified by operation id):
——————————————————————————————————————————————————————

   4 - access("M"."ID"=1)
   5 – filter("D"."MASTER_ID"=1)

 

As you can see, the optimizer is suggesting that I would need about 569 MB of temporary space. That is because I will join the single master row of about 36 K to 72,900 detail records, each approximately 40 bytes in size. When you multiply the 36 K by 72,900, you get about 2,505 MB of output, and the optimizer is estimating that I will need around half a gigabyte on disk in TEMP to store some of this during my order by—the sort step.

Next Steps


 ASK Tom
Tom Kyte answers your technology questions. Highlights from that forum appear in this column.

 FOLLOW Tom on Twitter

READ
 more Tom
 Oracle Database Concepts 11g Release 2 (11.2)

 DOWNLOAD Oracle Database 11g Release 2

LEARN more about
 updating millions of records
 1408549optimization with Oracle Net Services
 scalar subqueries

 Oracle Database Tom on Twitter

FOLLOW Oracle Database
 on Twitter
 on Facebook
 

In this particular edge case, you have tens of thousands of skinny detail records joined to a single wide master record, so it would be better to sort the detail records and then pick up the associated master record. To accomplish this, I first need to create an object type that looks like the master table: 

SQL> create or replace 
type myScalarType as object
  2  (
  3  id  number ,
  4  c1  varchar2(4000),
  5  c2  varchar2(4000),
  6  c3  varchar2(4000),
  7  c4  varchar2(4000),
  8  c5  varchar2(4000),
  9  c6  varchar2(4000),
 10  c7  varchar2(4000),
 11  c8  varchar2(4000),
 12  c9  varchar2(4000)
 13  );
 14  /

Type created.

 

Now I can select all the columns of the master table as a single “column,” using SQL similar to 

select myScalarType(id,c1,c2, .. c9),
  from master

 

This will enable me to use a scalar subquery against the master table when querying the detail table. If you recall from a previous article on scalar subqueries (“On Caching and Evangelizing SQL”), a scalar subquery must return at most a single row and a single column. To get multiple columns from a scalar subquery, you must use a type that has multiple attributes, such as my newly defined object type.

Once I have that multiple-attribute type, I write my query to retrieve the above data, as shown in Listing 4.

Code Listing 4: Using a scalar subquery alternative 

SQL> select x.master_data.id,
  2         id detail_id,
  3         data,
  4         x.master_data.c1,
  5         x.master_data.c2,
  6         x.master_data.c3,
  7         x.master_data.c4,
  8         x.master_data.c5,
  9         x.master_data.c6,
 10         x.master_data.c7,
 11         x.master_data.c8,
 12         x.master_data.c9
 13    from (
 14  select d.*,
 15         (select myScalarType( id, c1, c2, c3, c4, c5, c6, c7, c8, c9)
 16            from master
 17           where master.id = d.master_id) master_data
 18    from detail d
 19   where d.master_id = 1
 20         ) x
 21    order by data
 22  /

Execution Plan
————————————————————————————————
Plan hash value: 1647374361

——————————————————————————————————————————————————————————————————————————————
|Id| Operation                |Name  |Rows |Bytes |TempSpc|Cost(%CPU)|Time    |
——————————————————————————————————————————————————————————————————————————————
| 0| SELECT STATEMENT         |      |72900| 2278K|       |776    (1)|00:00:10|
| 1|  TABLE ACCESS BY INDEX...|MASTER|    1|36012 |       |  1    (0)|00:00:01|
|*2|   INDEX UNIQUE SCAN      |  ... |    1|      |       |  0    (0)|00:00:01|
| 3|  SORT ORDER BY           |      |72900| 2278K|  3168K|776    (1)|00:00:10|
|*4|   TABLE ACCESS FULL      |DETAIL|72900| 2278K|       |137    (1)|00:00:02|
———————————————————————————————————————————————————————————————————————————————

Predicate Information (identified by operation id):
—————————————————————————————————————————————————————

   2 - access("MASTER"."ID"=:B1)
   4 – filter("D"."MASTER_ID"=1)

 

On lines 14–19 of Listing 4, I query the detail table and use a scalar subquery to pick up the master record associated with each detail record. Due to the scalar subquery caching that will take place (see the previous article on scalar subqueries referenced above for details), even though I appear to be querying the master table more than 76,000 times, I will query it only once. So the query plan shows me reading and sorting the detail table and then picking up the master record after sorting and returning the rows. The plan suggests that I’ll need about 3 MB of TEMP space—a great difference from the half gigabyte suggested by the prior query, which returns exactly the same result.

When I run these queries with SQL Trace (sql_trace) enabled, I see the difference the scalar subquery can make in this case. Listing 5 includes the TKPROF reports for the queries in order (the join query is first, and the scalar subquery is second).

Code Listing 5: Comparing TKPROF reports for the join query and the scalar subquery 

call     count       cpu    elapsed       disk     query   current        rows
—————— ———————— ————————— —————————— —————————— ————————— ————————— ——————————
Parse        1      0.00       0.00          0         0         0           0
Execute      1      0.00       0.00          0         0         0           0
Fetch     4861     14.55      29.34     323542       426        55       72900
—————— ———————— ————————— —————————— —————————— ————————— ————————— ——————————
total     4863     14.55      29.34     323542       426        55       72900

Row Source Operation
————————————————————————————————————————————————————————————————————————————
SORT ORDER BY (cr=426 pr=323542 pw=323542 time=28592898 us cost=546194 
size=2627607600 card=72900) 
NESTED LOOPS  (cr=426 pr=0 pw=0 time=97803 us cost=138 size=2627607600 
card=72900) 
TABLE ACCESS BY INDEX ROWID MASTER (cr=10 pr=0 pw=0 time=99 us cost=1 
size=36012 card=1) 
INDEX UNIQUE SCAN SYS_C0047319 (cr=1 pr=0 pw=0 time=19 us cost=0 size=0 
card=1)(object id 143020) 
TABLE ACCESS FULL DETAIL (cr=416 pr=0 pw=0 time=41772 us cost=137 size=2332800 
card=72900)

call     count       cpu    elapsed       disk     query   current        rows
——————— ——————— ————————— —————————— —————————— ————————— —————————— —————————
Parse        1      0.00       0.00          0         0         0           0
Execute      1      0.00       0.00          0         0         0           0
Fetch     4861      0.38       0.56          0       429         1       72900
——————— ——————— ————————— —————————— —————————— ————————— ————————— ——————————
total     4863      0.38       0.56          0       429         1       72900

Row Source Operation
————————————————————————————————————————————————————————————————————————————
TABLE ACCESS BY INDEX ROWID MASTER (cr=13 pr=0 pw=0 time=185514 us cost=1 
size=36012 card=1) 
INDEX UNIQUE SCAN SYS_C0047319 (cr=4 pr=0 pw=0 time=72345 us cost=0 size=0 
card=1)(object id 143020) 
SORT ORDER BY (cr=429 pr=0 pw=0 time=439523 us cost=776 size=2332800 
card=72900) 
TABLE ACCESS FULL DETAIL (cr=416 pr=0 pw=0 time=29926 us cost=137 size=2332800 
card=72900)

 

As you can see in the reporting in Listing 5, the first query performed hundreds of thousands of I/Os—physical writes (pw in the row source operation) as well as physical reads (pr). These were reads from and writes to TEMP. Additionally, the amount of CPU consumed by the query was quite large—more than 14 CPU seconds—from having to shuffle around 36 K records instead of 40-byte records in memory and on disk. On the other hand, the second query utilized scalar subqueries and did no physical I/O in this example—in real life, it might have to perform some physical I/Os, but very few of them. Also, the amount of CPU used to sort 72,000 records of 40 bytes apiece was significantly smaller than the CPU used to sort the 36 K records. All in all, the query went from taking almost half a minute down to only half a second.

Does this mean that scalar subquery caching is the answer to all of your performance problems? Absolutely not. If the query returned many thousands of master records (a more typical circumstance when each master record has few detail records), looking up each one by an index would soon outweigh the benefits of sorting the details by themselves. However, scalar subqueries are a tool you can use in certain cases—such as those detailed in the earlier article on scalar subqueries. Consider them in special cases like this.


Tom Kyte Headshot


Tom Kyte is a database evangelist in Oracle’s Server Technologies division and has worked for Oracle since 1993. He is the author of Expert Oracle Database Architecture (Apress, 2005, 2010) and Effective Oracle by Design (Oracle Press, 2003), among other books.

 

 



Send us your comments