By Steven Feuerstein
FORALL begins, BULK COLLECT builds, and VALUES OF excels.
In my second article on PL/SQL enhancements in Oracle Database 10g, I will explore the new capability of using the FORALL statement with nonconsecutive driving indexes. (For the first article, see the November/December 2003 issue.) While this might at first sound fairly esoteric, it is actually a key enhancement to the FORALL statement that can greatly simplify code and improve performance in programs that update large numbers of rows of data from within a PL/SQL program.
I will first review the capabilities of FORALL and then examine the impact of Oracle Database 10g's support for nonconsecutive driving indexes.
Oracle introduced two new data manipulation language (DML) statements for PL/SQL in Oracle8i: BULK COLLECT and FORALL . Both statements implement a form of array processing inside PL/SQL; BULK COLLECT facilitates high-speed retrieval of data, and FORALL greatly improves performance of INSERT, UPDATE , and DELETE operations. Oracle Database achieves significant performance gains with these statements by greatly reducing the number of context switches between the PL/SQL and SQL statement execution engines.
With BULK COLLECT, you fetch multiple rows into one or more collections, rather than individual variables or records. The following example of BULK COLLECT retrieves all books containing "PL/SQL" in their titles into an associative array of records, all in a single pass to the database.
DECLARE TYPE books_aat IS TABLE OF book%ROWTYPE INDEX BY PLS_INTEGER; books books_aat; BEGIN SELECT * BULK COLLECT INTO books FROM book WHERE title LIKE '%PL/SQL%'; ... END;
Similarly, FORALL transfers data from a PL/SQL collection to the specified table using collections. The following code example shows a procedure that accepts a nested table of book information and inserts the full contents of that collection (the binding array) into the book table. Note that this example also takes advantage of Oracle9i's FORALL enhancement that allows the INSERT of a record directly into a table.
BULK COLLECT and FORALL are very helpful, not only in improving performance but also in simplifying the code you need to write for SQL operations in PL/SQL. The following FORALL INSERT of multiple rows demonstrates rather clearly why PL/SQL is considered to be the best programming language for the Oracle database.
CREATE TYPE books_nt IS TABLE OF book%ROWTYPE; / CREATE OR REPLACE PROCEDURE add_books ( books_in IN books_nt) IS BEGIN FORALL book_index IN books_in.FIRST .. books_in.LAST INSERT INTO book VALUES books_in(book_index); ... END;
Prior to Oracle Database 10g, however, there was an important restriction on using collections with FORALL : The database read the contents of the collection sequentially from the first to the last row in the IN range clause. If it encountered a row within that range that was undefined, Oracle Database would raise the ORA-22160 exception:
ORA-22160: element at index [N] does not exist
For simple applications of FORALL , this rule would not cause any difficulties. If, however, you want to take full advantage of FORALL in the widest possible range of circumstances, the requirement that any FORALL driving array be sequentially populated might increase program complexity and degrade performance.
In Oracle Database 10g, PL/SQL now offers two new clauses in the FORALL statement, INDICES OF and VALUES OF , which allow you to choose very selectively which rows from the driving array should be processed by the extended DML statement.
INDICES OF comes in very handy when your binding array is sparse or contains gaps. The syntax for this statement is:
FORALL indx IN INDICES OF sparse_collection INSERT INTO my_table VALUES sparse_collection (indx);
VALUES OF responds to a different situation: My binding array may or may not be sparse, but I want to use only a subset of the elements in that array. I can then use VALUES OF to point to just the values I wish to use in my DML operation. The syntax for this statement is:
FORALL indx IN VALUES OF pointer_array INSERT INTO my_table VALUES binding_array (indx);
In the remainder of this article, I will take a look at how to move from a CURSOR FOR loop to a FORALL implementation and how the VALUES OF clause can ease your programming tasks.
Suppose I need to write a procedure that gives raises to employees who are eligible (as determined by the comp_analysis.is_eligible function) and writes reports to the employee_history table for any employees who are ineligible for that raise. I work for a very large corporation; we have lots and lots of employees.
This is not a terribly challenging task for a PL/SQL developer. I don't even need to use BULK COLLECT or FORALL to get the job done, as I demonstrate in Listing 1. Instead, I use a CURSOR FOR loop and individual INSERT and UPDATE statements. This is straightforward code; unfortunately, I have been given 10 minutes in which to run this code, and my "old-fashioned" approach is running 30 minutes or more.
Fortunately, my company upgraded to Oracle9i and, even more fortunately, I learned about bulk processing at a recent Oracle seminar (as well as from the excellent demonstrations provided on Oracle Technology Network). So I decide to rewrite my program, using collections and bulk processing. The result is shown in Listing 2.
A quick glance at Listings 1 and 2 should make one point very clear: converting to collections and bulk processing can increase the volume and complexity of your code. If you need a serious boost in performance, however, that increase is well-justified. Now let's move beyond the quick glance and see what it takes to handle the conditional logic inside a CURSOR FOR loop when using FORALL .
In Listing 2, the first part of the declaration section (lines 6 through 11) defines several different collection TYPE s, one for each of the columns I will be retrieving from the employee table. I would rather declare a single collection TYPE based on employee%ROWTYPE , but FORALL does not yet support operations on collections of records in which I reference individual fields. So I still must declare a separate collection for each column for employee IDs, salaries, and hire dates.
Next, I will declare the collections I need for each of those columns (lines 13 through 21). First, I define the collections corresponding to the queried columns (lines 13 through 15):
employee_ids employee_aat; salaries salary_aat; hire_dates hire_date_aat;
I then need a new collection that holds the IDs only of those employees who are approved for a raise (line 17):
Finally, I declare one collection for each column that will be used to record those who were ineligible for raises (lines 19 through 21):
denied_employee_ids employee_aat; denied_salaries salary_aat; denied_hire_dates hire_date_aat;
With these data structures in place, let's now skip down to the executable section of this procedure (lines 72 through 75), to begin to understand how the collections are used to speed up my process.
retrieve_employee_info; partition_by_eligibility; add_to_history; give_the_raise;
I have used step-wise refinement (also known as "top-down design") to write this program. So rather than have a very long and hard-to-follow executable section, I have a four-line section that describes each step in my procedure by name. First, I retrieve the employee information (all the employees in the specified department). Then I partition or separate out those employees who will receive a raise and those who are denied. Once partitioned, I can add the denied employees to the employee history table and give a raise to the rest.
Writing code in this fashion makes the end result much more readable. I can then drill down into whichever area of the program is of concern to me.
With my collections declared, I can now use BULK COLLECT to retrieve the employee information (lines 23 through 30). This effectively replaces the CURSOR FOR loop. Now my data is loaded into my collections.
The partition logic (lines 32 through 46) requires that I go through each row in the collections I just populated and check the eligibility for a raise. If approved, then I copy the employee ID from my query-populated collection to the approved collection. If the person is not eligible, I copy the employee ID, salary, and hire date, because all are needed for the insert into the employee_history table.
With my original data now separated into two collections, I can use each of those collections as drivers for two different FORALL statements (starting on lines 51 and 66). I insert the denied collection data in bulk to the employee_history table ( add_to_history ), and I update the approved employee information in bulk in the employee table in the give_the_raise procedure.
Let's finish our examination of this rewrite with a closer look at add_to_history (lines 48 through 61). The FORALL statement (line 51) contains an IN clause that specifies the range of row numbers to be used in the bulk INSERT . In my explanation of the second rewrite of my procedure, I will refer to the collection used to define that range as the "driving collection." In this version of add_to_history , however, I simply say: use all the rows defined in denied_employee_ids . Within the INSERT itself, I use all three of the denied collections; I will refer to these collections as the "data collections." As you can see, the driving and data collections do not need to match. This is a critical point when learning to use the new features in Oracle Database 10g.
The bottom line, then, of my rewrite in Listing 2 is that I have approximately twice the number of lines of code as in Listing 1 , but the Listing 2 code runs within the required window of time. Prior to Oracle Database 10g, in such a situation I would simply be glad that I was able to fit within the window and move on to my next tasks.
With the latest version of PL/SQL in Oracle Database 10g, however, I now have further opportunities for improvement, in terms of performance, readability, and code volume.
In Oracle Database 10g, you can now specify a subset of rows in the driving collection to be used by a FORALL statement. You can define that subset in one of two ways:
Match the row numbers in the data collection(s) with the row numbers in the driving collection. You would do this with the INDICES OF clause.
Match the row numbers in the data collection(s) with the values found in the defined rows of the driving collection. You would do this with the VALUES OF clause.
I will use the VALUES OF clause in my second and final rewrite of give_raises_in_department. Listing 3 contains the full code for this version. I will skip over the parts of this program that are unchanged from the previous version.
Starting with the declaration of collections, notice that I no longer define additional collections to hold the approved and denied employee information. Instead, in Listing 3 (lines 17 through 21) I declare two "guide" collections: one for approved employees and another for those denied raises. The datatype of both these collections is Boolean; as you will soon see, the datatype of the collections doesn't matter in the least. The only information the FORALL statement cares about is which rows are defined.
|The elapsed times for the three implementations of give_raises_in_department with 50,000 rows in the employee table.||Implementation||Elapsed Time|
|CURSOR FOR loop||00:00:38.01|
|Pre-Oracle Database 10g bulk processing||00:00:06.09|
|Oracle Database 10g bulk processing||00:00:02.06|
|The elapsed times for the three implementations of give_raises_in_department with 100,000 rows in the employee table||Implementation||Elapsed Time|
|CURSOR FOR loop||00:00:58.01|
|Pre-Oracle Database 10g bulk processing||00:00:12.00|
|Oracle Database 10g bulk processing||00:00:05.05|
The retrieve_employee_info subprogram is the same as before, but the way I partition my data is now quite different (lines 32 through 44). Rather than copy records of information from one collection to another (a relatively slow operation), I simply define the row (by assigning it a value of TRUE ) in the appropriate guide collection that matches the row number of the employee IDs collection.
I can now use the approved_list and denied_list collections as the driving collections in two different FORALL statements (starting on lines 49 and 65).
To insert into the employee_history table, I use this formulation:
FORALL indx IN VALUES OF denied_list
And to perform the updates (to give employees their raises), I use this formulation:
FORALL indx IN VALUES OF approved_list
Within both DML statements, the data collections are the original collections populated in the BULK COLLECT retrieval step; no copies were made. With VALUES OF , Oracle Database chooses among the rows in those data collections, using only those rows whose row numbers match those of the driving collection.
By taking advantage of VALUES OF in this program, I was able to avoid copying entire records of information, replacing them with a simple list of row numbers. For large arrays, the overhead of performing these copies can be quite noticeable. To test the advantages of the Oracle Database 10g approach, I loaded the employee table and ran tests with 50,000 and then 100,000 rows of data. I modified the pre-Oracle Database 10g bulk-processing implementation to perform many more copies of collection contents in order to emulate more of a real-world scenario. Then I used SQL*Plus SET TIMING ON to show the elapsed time for running each of my various implementations. Table 1 shows the results.
The conclusions from these timings are clear: Switching from individual DML statements to bulk processing results in greatly reduced elapsed time, from 38 to 6 seconds with 50,000 rows and from 58 to 12 seconds with 100,000 rows. Furthermore, by avoiding the copies of data with VALUES OF , I was able to cut my elapsed time roughly in half.
Even without the improvement in performance, VALUES OF and its sister clause, INDICES OF , increase the flexibility of the PL/SQL language and make it easier for developers to write more-intuitive and more-maintainable code.
PL/SQL is, at this point in the product's life, a mature and powerful language. As a consequence, many of its new features are incremental and evolutionary. Nevertheless, these new features can make a significant difference in an application's performance and a developer's productivity. VALUES OF is a fine example of such a feature.
Steven Feuerstein (email@example.com) is an authority on the PL/SQL language. He is the author of nine books on PL/SQL (all from O'Reilly & Associates, oracle.oreilly.com), including Oracle PL/SQL Best Practices and Oracle PL/SQL Programming. He offers training on the language and is a senior technology adviser for Quest Software.