As Published In
Oracle Magazine
May/June 2007

TECHNOLOGY: Ask Tom


On Loading and Extents

By Tom Kyte Oracle Employee ACE

Our technologist follows the direct path and optimizes extent usage.

We are loading lots of files (49,000+) into a database, using SQL*Loader in parallel and direct path mode. Our tablespaces are locally managed, with an extent size defined as 4MB. The load files vary in size, but the size averages around 380K. What we see happening is that Oracle Database allocates a new extent for each file loaded (number of extents per table = number of load files per table), instead of using up available space in extents.

Can you explain how SQL*Loader allocates extents in parallel and direct path mode?

This is a frequently asked question, and the answer is somewhat complex. Fortunately, I have already written up a full response to this in the book Expert Oracle Database Architecture: 9i and 10g Programming Techniques and Solutions (Apress, 2005). The answer has to do with how direct path operations utilize existing space (in short, they do not). So, in this case, Oracle Database would tend to allocate 49,000 4MB extents and load about 380KB into each one!

After you see how Oracle Database allocates space during direct path operations, I'll show two possible solutions to the above issue. One is to use a much smaller uniform extent size, and the other (and easier) solution is to use AUTOALLOCATE extent sizes, which permits Oracle Database to trim the extents back to the smallest size possible.

The following book excerpt utilizes parallel data definition language (DDL), but the extent issue described above follows the same pattern—each load process allocates its own extent to load into. Note that the excerpt has been edited for length and format.

Parallel DDL and Extent Trimming

Parallel DDL and operations such as a direct path load via SQL*Loader rely on direct path operations. That is, the data is not passed to the buffer cache to be written later; rather, an operation such as a CREATE TABLE AS SELECT will create new extents and write directly to them, and the data goes straight from the query to disk in those newly allocated extents. Each parallel execution server performing its part of the CREATE TABLE AS SELECT writes to its own extent. The INSERT /*+ APPEND */ (a direct path insert) writes above a segment's high-water mark (HWM), and each parallel execution server again writes to its own set of extents, never sharing them with other parallel execution servers. Therefore, if you do a parallel CREATE TABLE AS SELECT and use four parallel execution servers to create the table, you will have at least four extents—maybe more. But each of the parallel execution servers will allocate its own extent, write to it, and, when it fills up, allocate another new extent. The parallel execution servers will never use an extent allocated by some other parallel execution server.

This sounds all right at first, but in a data warehouse environment, this can lead to wasted space after a large load. Let's say you want to load 1,010MB of data (about 1GB) and are using a tablespace with 100MB extents. You decide to use 10 parallel execution servers to load this data. Each would start by allocating its own 100MB extent (there will be 10 of them in all) and filling it up. Because each has 101MB of data to load, they would fill up their first extent and then proceed to allocate another 100MB extent, of which they would use 1MB. You now have 20 extents, 10 of which are full and 10 of which have 1MB each, and 990MB is allocated but not used. This space could be used the next time you load, with conventional path operations, but right now you have 990MB of dead space. This is where extent trimming comes in. Oracle Database will attempt to take the last extent of each parallel execution server and trim it back to the smallest size possible.

Extent Trimming and Dictionary-Managed Tablespaces

If you are using legacy dictionary-managed tablespaces, Oracle Database will be able to convert each of the 100MB extents that contain just 1MB of data into 1MB extents. Unfortunately, that would (in dictionary-managed tablespaces) tend to leave 10 noncontiguous 99MB extents free, and because your allocation scheme is for 100MB extents, this 990MB of space would not be very useful. The next allocation of 100MB would likely not be able to use the existing space, because you would have 99MB of free space, followed by 1MB of allocated space, followed by 99MB of free space, and so on. (I will not review the dictionary-managed approach further here.)

Extent Trimming and Locally Managed Tablespaces

Enter locally managed tablespaces (LMTs). There are two types: UNIFORM SIZE, with which every extent in the tablespace is always precisely the same size, and AUTOALLOCATE, with which Oracle Database decides how big each extent should be, by using an internal algorithm. Both of these approaches nicely solve the problem of having 99MB of free space, followed by 1MB of used space, followed by 99MB of free space, and so on, which results in lots of free space that cannot be used. However, they each solve that problem very differently.

The UNIFORM SIZE approach obviates extent trimming altogether. When you use UNIFORM SIZE extents, Oracle cannot perform extent trimming. All extents are of a single size—none can be smaller (or larger) than that single size.

AUTOALLOCATE extents, on the other hand, do support extent trimming, but in an intelligent fashion. They use a few specific extent sizes and have the ability to use space of different sizes—that is, the algorithm permits the use of all free space in the tablespace over time. Unlike the dictionary-managed tablespace, with which if you request a 100MB extent, the request fails if it can find only 99MB of free extents (so close, yet so far), a locally managed tablespace with AUTOALLOCATE extents can be more flexible. It can reduce the size of the request it makes, to attempt to use all of the free space.

Let's now look at the differences between the two LMT approaches. To do that, we need a real-life example to work with. We'll set up an external table capable of being used for parallel direct path loading. Even if you are still using SQL*Loader for parallel direct path loading of data, this section applies entirely—you just have to do manual scripting to actually load the data. So, in order to investigate extent trimming, we need to set up our example load, perform the loads under varying conditions, and examine the results.

Setting up. To get started, we need an external table. I have a legacy control file from SQL*Loader that I used to use to load data: 

                               
LOAD DATA
INFILE '/tmp/big_table.dat'
INTO TABLE big_table
REPLACE
FIELDS TERMINATED BY '|'
(
id, owner, object_name, subobject_name, object_id, data_object_id, object_type, 
created, last_ddl_time, timestamp, status, temporary, generated, secondary
)

                           

We can convert this easily into an external table definition, by using SQL*Loader itself: 
 

$ sqlldr big_table/big_table 
big_table.ctl 
                                external_table=generate_only
SQL*Loader: Release 10.1.0.3.0 - 
Production on Mon Jul 11 14:16:20 2005
Copyright (c) 1982, 2004, Oracle.  
All rights reserved.
                            

Note the EXTERNAL_TABLE parameter passed to SQL*Loader. It causes SQL*Loader to not load data in this case but rather to generate a CREATE TABLE statement in the log file. Listing 1 shows this CREATE TABLE statement. (This is an abridged statement; I've edited out repetitive elements to make the example smaller.)

Code Listing 1: Abridged CREATE TABLE statement 

CREATE TABLE "SYS_SQLLDR_X_EXT_BIG_TABLE"
(
  "ID" NUMBER,
  .
  .
  .
  "SECONDARY" VARCHAR2(1)
)
ORGANIZATION external
(
  TYPE oracle_loader
  DEFAULT DIRECTORY SYS_SQLLDR_XT_TMPDIR_00000
  ACCESS PARAMETERS
  (
    RECORDS DELIMITED BY NEWLINE CHARACTERSET WE8ISO8859P1
    BADFILE 'SYS_SQLLDR_XT_TMPDIR_00000':'big_table.bad'
    LOGFILE 'big_table.log_xt'
    READSIZE 1048576
    FIELDS TERMINATED BY "|" LDRTRIM
    REJECT ROWS WITH ALL NULL FIELDS
    (
      "ID" CHAR(255)
        TERMINATED BY "|",
        .
        .
        .
      "SECONDARY" CHAR(255)
        TERMINATED BY "|"
    )
  )
  location
  (
    'big_table.dat'
  )
)REJECT LIMIT UNLIMITED


All we need to do is edit that a bit to name the external table the way we want —by changing the directories, perhaps, and so on:

 

SQL> create or replace directory 
  2  my_dir as '/tmp/'
  3  /
Directory created.

Code Listing 2: CREATE TABLE statement 

SQL> CREATE TABLE "BIG_TABLE_ET"
  2  (
  3    "ID" NUMBER,
 .
 .
 .
 16    "SECONDARY" VARCHAR2(1)
 17  )
 18  ORGANIZATION external
 19  (
 20    TYPE oracle_loader
 21    DEFAULT DIRECTORY MY_DIR
 22    ACCESS PARAMETERS
 23    (
 24      RECORDS DELIMITED BY NEWLINE CHARACTERSET WE8ISO8859P1
 25      READSIZE 1048576
 26      FIELDS TERMINATED BY "|" LDRTRIM
 27      REJECT ROWS WITH ALL NULL FIELDS
 28    )
 29    location
 30    (
 31      'big_table.dat'
 32    )
 33  )REJECT LIMIT UNLIMITED
 34  /
Table created.

After that, all we need to do is actually create the table, as shown in Listing 2. Then we parallel-enable this table. This is the magic step that will facilitate an easy parallel direct path load: 

SQL> alter table big_table_et PARALLEL;
Table altered.

Extent trimming with UNIFORM versus AUTOALLOCATE locally managed tablespaces. That's all we need to do in terms of setting up the load component. Now we would like to investigate how space is managed in an LMT that uses UNIFORM extent sizes, compared to how space is managed in an LMT that uses AUTOALLOCATE extents. In this case, we'll use 100MB extents. First we create LMT_UNIFORM, which uses uniform extent sizes: 

SQL> create tablespace lmt_uniform
  2  datafile 
  3  '/u03/ora10gr1/lmt_uniform.dbf'
  4  size 1048640K reuse 
  5  autoextend on next 100m
  6  extent management local
  7  uniform size 100m;
Tablespace created.

And next we create LMT_AUTO, which uses AUTOALLOCATE to determine extent sizes: 

SQL> create tablespace lmt_auto
  2  datafile 
  3 '/u03/ora10
                               gr1/lmt_auto.dbf'
  4 size 1048640K reuse 
  5  autoextend on next 100m
  6  extent management local
  7  autoallocate;
Tablespace created.
                            

Each tablespace started with a 1GB datafile (plus 64KB used by LMTs to manage the storage; it would be 128K extra instead of 64K if we were to use a 32K block size). We permit these datafiles to autoextend 100MB at a time. We are going to load the file 

$ ls -lag big_table.dat
-rw-rw-r--    1 tkyte    1067107251 ... 

which is a 10,000,000-record file. It was created with the big_table.sql script, and unloaded with the flat.sql script, available at asktom.oracle.com.

Next, we do a parallel direct path load of this file into each tablespace: 

SQL> create table uniform_test
   
                               2  parallel
  3  tablespace lmt_uniform
  4  as
  5  select * from big_table_et;
Table created.

SQL> create table autoallocate_test
   
                               2  parallel
  3  tablespace lmt_auto
  4  as
  5  select * from big_table_et;
Table created.
                            

On my system, which has four CPUs, these CREATE TABLE statements executed with eight parallel execution servers and one coordinator. I verified that this was the case in Listing 3, by querying one of the dynamic performance views related to parallel execution, V$PX_SESSION, while these statements were running.

Code Listing 3: Query against V$PX_SESSION 

SQL> select sid, serial#, qcsid, qcserial#, degree
  2  from v$px_session;

       SID     SERIAL#  QCSID   QCSERIAL#      DEGREE   
---------- ----------- ------ -----------   ---------
       137          17    154         998           8
       139          13    154         998           8
       141          17    154         998           8
       150         945    154         998           8
       161         836    154         998           8
       138           8    154         998           8
       147          15    154         998           8
       143          41    154         998           8
       154         998    154

9 rows selected.

Note that in creating the UNIFORM_TEST and AUTOALLOCATE_TEST tables, we simply specified PARALLEL on each table, with Oracle Database choosing the degree of parallelism. In this case, I was the sole user of the machine (all resources available) and Oracle Database defaulted to eight parallel execution servers, based on the number of CPUs (four) and the PARALLEL_THREADS_PER_CPU parameter setting, which defaults to two.

The SID and SERIAL# columns are the identifiers of the parallel execution sessions, and the QCSID and QCSERIAL# columns are the identifiers of the query coordinator of the parallel execution. So, with eight parallel execution sessions running, we would like to see how the space was used. A quick query against USER_SEGMENTS, shown in Listing 4, gives us a good idea.

Code Listing 4: Query against USER_SEGMENTS 

SQL> select segment_name, blocks, extents
  2  from user_segments 
  3 where segment_name in ( 'UNIFORM_TEST', 'AUTOALLOCATE_TEST' );
 
SEGMENT_NAME            BLOCKS        EXTENTS
--------------------------------------------- 
UNIFORM_TEST            204800             16
AUTOALLOCATE_TEST       145592            714

Because we were using an 8K block size, Listing 4 shows a difference of about 462MB, or—looking at it from a ratio perspective—AUTOALLOCATE_TEST is about 70 percent of the size of UNIFORM_TEST as far as allocated space goes. If we look at the actual used space, shown in Listing 5, we can see that if we take away the blocks on the freelist for UNIFORM_TEST—59,224 of them—the tables consume about the same amount of actual space but the amount of space needed by the UNIFORM tablespace is considerably more. This is all due to the extent trimming that did not take place. If we look at the extents in UNIFORM_TEST, shown in Listing 6, we see this clearly. Each extent is 100MB.

Code Listing 5: Show space used 

SQL> exec show_space('UNIFORM_TEST' );
Free Blocks........................................................59,224
Total Blocks......................................................204,800
Total Bytes.................................................1,677,721,600
Total MBytes........................................................1,600
Unused Blocks...........................................................0
Unused Bytes............................................................0
Last Used Ext FileId....................................................6
Last Used Ext BlockId...................................................9
Last Used Block....................................................12,800
PL/SQL procedure successfully completed.
 
SQL> exec show_space('AUTOALLOCATE_TEST' );
Free Blocks............................................................16
Total Blocks......................................................145,592
Total Bytes.................................................1,192,689,664
Total MBytes........................................................1,137
Unused Blocks...........................................................0
Unused Bytes............................................................0
Last Used Ext FileId....................................................8
Last Used Ext BlockId..................................................41
Last Used Block.........................................................8
PL/SQL procedure successfully completed.

Code Listing 6: Extents in UNIFORM_TEST 

SQL> select segment_name, extent_id, blocks
  2  from user_extents where segment_name = 'UNIFORM_TEST';
 
SEGMENT_NAME            EXTENT_ID      BLOCKS
--------------------------------------------
UNIFORM_TEST                    0       12800
UNIFORM_TEST                    1       12800
UNIFORM_TEST                    2       12800
UNIFORM_TEST                    3       12800
UNIFORM_TEST                    4       12800
UNIFORM_TEST                    5       12800
UNIFORM_TEST                    6       12800
UNIFORM_TEST                    7       12800
UNIFORM_TEST                    8       12800
UNIFORM_TEST                    9       12800
UNIFORM_TEST                   10       12800
UNIFORM_TEST                   11       12800
UNIFORM_TEST                   12       12800
UNIFORM_TEST                   13       12800
UNIFORM_TEST                   14       12800
UNIFORM_TEST                   15       12800
 
16 rows selected.

Now, it would be a waste to list all 714 extents in AUTOALLOCATE_TEST, so let's look at them in aggregate, as shown in Listing 7. This generally fits in with how LMTs with AUTOALLOCATE are observed to allocate space. The 8-, 128-, and 1,024-block extents are "normal"; we always observe them with AUTOALLOCATE. The rest, however, are not normal, and we do not usually observe them. They are due to the extent trimming that takes place. Some of the parallel execution servers finished their part of the load—they took their last 8MB (1,024-block) extent and trimmed it, resulting in a leftover bit. One of the other parallel execution sessions, when it needed space, could use this spare bit. In turn, as these other parallel execution sessions finished processing their own loads, they would trim their last extent and leave spare bits of space.

Code Listing 7: Extents in AUTOALLOCATE_TEST 

SQL> select segment_name, blocks, count(*)
  2  from user_extents
  3  where segment_name = 'AUTOALLOCATE_TEST'
  4  group by segment_name, blocks
  5  /
 
SEGMENT_NAME             BLOCKS     COUNT(*)
--------------------------------------------
AUTOALLOCATE_TEST             8          128
AUTOALLOCATE_TEST           128          504
AUTOALLOCATE_TEST           240            1
AUTOALLOCATE_TEST           392            1
AUTOALLOCATE_TEST           512            1
AUTOALLOCATE_TEST           656            1
AUTOALLOCATE_TEST           752            5
AUTOALLOCATE_TEST           768            1
AUTOALLOCATE_TEST          1024           72
 
9 rows selected.
 

Next Steps



ASK Tom
Oracle Vice President Tom Kyte answers your most difficult technology questions. Highlights from that forum appear in this column.
 asktom.oracle.com

READ more Tom
Expert Oracle Database Architecture: 9i and 10g Programming Techniques and Solutions

 DOWNLOAD sample data script for this column

So, which approach should you use? If your goal is to do direct path loading in parallel as often as possible, I suggest using AUTOALLOCATE as your extent management policy. Parallel direct path operations such as this will not use space under the object's HWM—the space on the freelist. So, unless you do some conventional path inserts into these tables also, UNIFORM allocation will permanently contain additional free space that it will never use. Unless you can size the extents for the UNIFORM LMT to be much smaller, you will see what I would term excessive wasted space over time—and remember that this space is associated with the segment and will be included in a full scan of the table.

To demonstrate this, let's do another parallel direct path load into these existing tables, using the same inputs: 

SQL> alter session enable 
  2  parallel dml;
Session altered.
 
SQL> insert /*+ append */ 
  2  into UNIFORM_TEST 
  3  select * from big_table_et;
10000000 rows created.
 
SQL> insert /*+ append */ 
  2  into AUTOALLOCATE_TEST 
  3  select * from big_table_et;
10000000 rows created.
 
SQL> commit;
Commit complete.

If we compare the space utilization of the two tables after that operation, shown in Listing 8, we can see that as we load more and more data into the UNIFORM_TEST table, using parallel direct path operations, the space utilization gets worse over time. We would want to use a significantly smaller uniform extent size or use AUTOALLOCATE for extents. AUTOALLOCATE may well generate more extents over time, but the space utilization is superior, due to extent trimming.

Code Listing 8: Compare space utilization of UNIFORM and AUTOALLOCATE extents 

SQL> exec show_space( 'UNIFORM_TEST' );
Free Blocks........................................................118,463
Total Blocks.......................................................409,600
Total Bytes..................................................3,355,443,200
Total MBytes.........................................................3,200
Unused Blocks............................................................0
Unused Bytes.............................................................0
Last Used Ext FileId.....................................................6
Last Used Ext BlockId..............................................281,609
Last Used Block.....................................................12,800

PL/SQL procedure successfully completed.
 
SQL> exec show_space( 'AUTOALLOCATE_TEST' );
Free Blocks.............................................................48
Total Blocks.......................................................291,184
Total Bytes..................................................2,385,379,328
Total MBytes.........................................................2,274
Unused Blocks............................................................0
Unused Bytes.............................................................0
Last Used Ext FileId.....................................................8
Last Used Ext BlockId..............................................140,025
Last Used Block..........................................................8
 
PL/SQL procedure successfully completed.

 



Tom Kyte
has worked for Oracle since 1993. He is a vice president in the Oracle Public Sector group and the author of Expert Oracle Database Architecture: 9i and 10g Programming Techniques and Solutions (Apress, 2005) and Effective Oracle by Design (Oracle Press, 2003), among others.


Send us your comments