What You See Is What You Get Element

The Inefficiency of Direct-Path Bulk Bind Inserts

by Paul Guerin

Beware of the storage inefficiency of small bulk bind inserts that use the direct path.

Published November 2017


Oracle Database offers a number of different methods to load many rows into a table.

If you are loading a collection (for example, an associative array, a varray, or a nested table) into a table, the simplest method is to do the insertion within a loop such as the following:

-- PL/SQL sends a SQL statement to the SQL engine for each insert.
-- Unfortunately context switches between PL/SQL and SQL engines lead to slow performance.
FOR j IN id.FIRST .. id.LAST LOOP
  INSERT INTO &tbl
  VALUES (
    id(j),
    SYSDATE,
    SYS_CONTEXT('USERENV', 'SESSION_USER'),
    SYS_CONTEXT('USERENV', 'SID'),
    'new'
  );
END LOOP;

While simple, unfortunately this type of insert inside a loop cannot take advantage of the direct path for inserts. Additionally, statement-by-statement processing in a loop is not as efficient as batched SQL statements.

However, there is a method to efficiently load a collection into a table. Just as a bulk load can be used to efficiently load rows from one table into another, a bulk bind (that is, FORALL statement) of insert statements can efficiently batch load a collection into a table.

A bulk bind can use delete and update statements, and also insert statements can be used.

Also just as there are two types of bulk load, there are two types of insert bulk bind:

  • Conventional insert bulk bind
  • Direct-path insert bulk bind

Both types of insert bulk bind are achieved with the FORALL clause, and the following is an example of a conventional insert bulk bind:

-- Example of a conventional bulk bind.
-- No context switches as there is now no PL/SQL.
FORALL j IN id.FIRST .. id.LAST
INSERT INTO &tbl
VALUES (
  id(j),
  SYSDATE,
  SYS_CONTEXT('USERENV', 'SESSION_USER'),
  SYS_CONTEXT('USERENV', 'SID'),
  'new'
);

A bulk bind of insert statements can also use the direct path via the APPEND_VALUES hint.

In fact the APPEND_VALUES hint is exclusively for bulk bind inserts. The following is an example of a direct-path bulk bind:

-- example of a bulk bind which uses the direct path
FORALL j IN id.FIRST .. id.LAST
INSERT /*+ APPEND_VALUES */ INTO &tbl
VALUES (
  id(j),
  SYSDATE,
  SYS_CONTEXT('USERENV', 'SESSION_USER'),
  SYS_CONTEXT('USERENV', 'SID'),
  'new'
);

The advantage of using the direct path for inserting is the potential performance benefits, especially if using you are using the nologging option.

A direct-path bulk load should be used only to insert a significant number of rows. Should this also be the case for direct-path bulk binds?

What would be the consequences of direct-path bulk bind inserting a small collection into a table?

Storage Efficiency of a Conventional Bulk Bind Insert

As a benchmark, let's insert a 10-element collection into a table, and then repeat the insert so there is a total of 1,000,000 rows in the table.

So by inserting a 10-element collection into a table, there will be 10 inserts per batch of statements at a time.

FORALL i IN id.FIRST..id.LAST
INSERT INTO &tbl
VALUES (
   sysdate,
   id(i),
   x,
   x,
   null,
   TO_CHAR(z)
);
commit;

Once the bulk bind insert is complete, we need to verify that there are 1 million rows in the table.

Also, we'll determine the segment size of the table needed to store 1 million rows.

select count(*)/1000/1000 M_of_rows from &tbl;

 M_OF_ROWS
----------
         1

SQL>

select bytes/1024/1024 MB
from dba_segments
where owner=upper('&schm') and segment_name = upper('&tbl');

        MB
----------
        42

SQL>

So, there are 1 million rows inside a table of only 42 MB.

If we also take some simple statistics and the histogram of the rows per block in the segment, the median number of rows per block can be determined:

SELECT avg(rows_per_block) "avg_rows_per_block",
   median(rows_per_block) "median_rows_per_block",
   stddev(rows_per_block) "standard deviation"
FROM (
  SELECT
    count(*) rows_per_block,
    dbms_rowid.rowid_relative_fno(rowid),
    dbms_rowid.rowid_block_number(rowid)
  FROM &tbl
  GROUP BY dbms_rowid.rowid_relative_fno(rowid), dbms_rowid.rowid_block_number(rowid)
);

avg_rows_per_block median_rows_per_block standard deviation
------------------ --------------------- ------------------
        191.534189                   191         1.80161418

SQL>

col "no rows" heading "no|rows"
col bucket1 heading "1-9|rows"
col bucket2 heading "10|rows"
col bucket3 heading "11-19|rows"
col bucket4 heading "20|rows"
col bucket5 heading "21-185|rows"
col bucket6 heading "186-195|rows"
col bucket7 heading "196-205|rows"
col bucket8 heading "206-215|rows"
col bucket_max heading ">=216|rows"
SELECT
  MIN((select SUM(BLOCKS) from dba_extents where owner=UPPER('&schm') AND segment_name=UPPER
          ('&tbl'))-acc_blocks) "no rows",
  SUM(CASE WHEN rows_per_block BETWEEN 1 AND 9 THEN 1 ELSE NULL END) bucket1,
  SUM(CASE WHEN rows_per_block BETWEEN 10 AND 10 THEN 1 ELSE NULL END) bucket2,
  SUM(CASE WHEN rows_per_block BETWEEN 11 AND 19 THEN 1 ELSE NULL END) bucket3,
  SUM(CASE WHEN rows_per_block BETWEEN 20 AND 20 THEN 1 ELSE NULL END) bucket4,
  SUM(CASE WHEN rows_per_block BETWEEN 21 AND 185 THEN 1 ELSE NULL END) bucket5,
  SUM(CASE WHEN rows_per_block BETWEEN 186 AND 195 THEN 1 ELSE NULL END) bucket6,
  SUM(CASE WHEN rows_per_block BETWEEN 196 AND 205 THEN 1 ELSE NULL END) bucket7,
  SUM(CASE WHEN rows_per_block BETWEEN 206 AND 215 THEN 1 ELSE NULL END) bucket8,
  SUM(CASE WHEN rows_per_block >= 216 THEN 1 ELSE NULL END) bucket_max
FROM (
  -- count the rows in each block
  SELECT
    count(*) rows_per_block,
    dbms_rowid.rowid_relative_fno(rowid),
    dbms_rowid.rowid_block_number(rowid),
    COUNT(*) OVER (ORDER BY dbms_rowid.rowid_relative_fno(rowid), 
            dbms_rowid.rowid_block_number(rowid)) acc_blocks
  FROM >tbl
  GROUP BY dbms_rowid.rowid_relative_fno(rowid), dbms_rowid.rowid_block_number(rowid)
);

        no        1-9         10      11-19         20     21-185    186-195    196-205    206-215      >=216
      rows       rows       rows       rows       rows       rows       rows       rows       rows    rows
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
       155                                                      1       4712        503          5

SQL>

The median number of rows per block is 191, and the histogram confirms that most blocks have between 186 and 195 rows. These statistics become the benchmark for a conventional bulk bind insert.

Storage Efficiency of a Direct-Path Bulk Bind Insert

This time, we'll bulk bind insert the same 10-element collection into a new table. Again, the insert will be repeated to obtain 1,000,000 rows.

Rather than using a conventional bulk bind insert, we'll use the APPEND_VALUES hint that will invoke a direct-path bulk bind insert:

  FORALL i IN id.FIRST..id.LAST      -- direct-path bulk bind insert
  INSERT /*+ APPEND_VALUES */ INTO &tbl
  VALUES (
   sysdate,
   id(i),
   x,
   x,
   null,
   TO_CHAR(z)
  );
  -- commit to avoid possible ORA-12838
  commit;

Confirm there are 1 million rows in the table, and determine the segment size after the bulk bind is complete.

 M_OF_ROWS
----------
         1

SQL>

        MB
----------
       792

SQL>

You can see that the segment size for 1 million rows has grown to 792 MB (from 42 MB previously), which suggests massive storage inefficiency!

Looking at the simple statistics and the histogram will indicate where the inefficiency is:

avg_rows_per_block median_rows_per_block standard deviation
------------------ --------------------- ------------------
                10                    10                  0

SQL>

        no        1-9         10      11-19         20     21-185    186-195    196-205    206-215      >=216
      rows       rows       rows       rows       rows       rows       rows       rows       rows    rows   
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------         
      1376               100000

SQL>

So, when you are using a direct-path bulk bind insert of a 10-element collection into a table, there are only 10 rows per block.

Compare this with the conventional bulk bind insert, which obtained 191 rows per block.

The inefficiency comes about due to each direct-path bulk bind insert just being the size of the collection (that is, 10 rows per block).

The direct path ensures the insert is always above the high-water mark for the table, so a new block is always used for the next insert even though there is sufficient room below the high-water mark.

Storage Efficiency of a Direct-Path Bulk Bind Insert of a Larger Collection

This time we'll do a direct-path bulk bind insert of a collection with 2,000 elements into a new table. So, there will be 2,000 inserts in a batch of direct-path insert statements:

FORALL i IN id.FIRST..id.LAST      -- direct-path bulk bind insert
INSERT /*+ APPEND_VALUES */ INTO &tbl
VALUES (
   sysdate,
   id(i),
   x,
   x,
   null,
   TO_CHAR(z)
);
-- commit to avoid possible ORA-12838
commit;

After the bulk bind is complete, we'll verify that there are 1 million rows in the table and see what the new segment size is:

 M_OF_ROWS
----------
         1

SQL>

        MB
----------
        40

SQL>

For storage efficiency, the output shows that the segment size for 1 million rows has returned to close to the conventional bulk bind benchmark of 42 MB (down from 792 MB). The block statistics will confirm that the storage efficiency for the larger collection is higher:

avg_rows_per_block median_rows_per_block standard deviation
------------------ --------------------- ------------------
               200                   200         9.03683201

SQL>

        no        1-9         10      11-19         20     21-185    186-195    196-205    206-215      >=216
      rows       rows       rows       rows       rows       rows       rows       rows       rows    rows
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
       120                                                     99                  4010        891

SQL>

Now there are on average 200 rows per block for the direct-path bulk bind insert of a large collection.

Again, this is close to the benchmark of 191 rows per block for the conventional bulk bind insert.

Conclusion

In conclusion, when using the direct-path bulk bind insert statement, the larger the collection the more efficient the storage into a table.

It follows that if there are higher row densities in a table, then potentially more rows can be queried per block.

More rows per block can result in less blocks being queried, which leads to higher levels of application performance.

See Also

About the Author

Paul Guerin is an international consultant who specializes in Oracle Database performance. He is based at a global delivery center in Southeast Asia, but has clients from Australia, Europe, Asia, and the United States. Moreover, he has presented at some of the world's leading Oracle conferences, including Oracle OpenWorld 2013. His work has also been featured in the Independent Oracle Users Group (IOUG) Tips and Best Practices Booklet of 2015 and via several database community articles of the My Oracle Support Community. He is an Oracle Certified Professional DBA and continues to be a participant of the Oracle ACE program.