TECHNOLOGY: PL/SQL PracticesOn Working in ParallelBy Steven Feuerstein Finish faster with DBMS_PARALLEL_EXECUTE. I have to perform DML [data manipulation language] operations on millions of rows of data. Writing the SQL statement to perform the changes isn’t hard, but managing rollback segments and getting the process to finish in an acceptable amount of time are. Does Oracle Database 11g Release 2 offer any new features that can help? Each new version of Oracle Database brings an ever-widening array of built-in or supplied packages that extend the capabilities of Oracle Database. Oracle Database 11g Release 2 is no exception: it contains more than 50 new packages (all documented in Oracle Database PL/SQL Packages and Types Reference ), including DBMS_PARALLEL_EXECUTE. DBMS_PARALLEL_EXECUTE now provides the ability to break up a large table according to a variety of criteria, from ROWID ranges to key values and user-defined methods. You can then run a SQL statement or a PL/SQL block against these different “chunks” of the table in parallel, using the database scheduler to manage the processes running in the background. Error logging, automatic retries, and commits are integrated into the processing of these chunks. To use DBMS_PARALLEL_EXECUTE to run tasks in parallel, your schema will need the CREATE JOB system privilege. You can then use the following subprograms of the built-in package (these are the most commonly used of the package’s routines) to achieve your goal: • CREATE_TASK creates a named task to be managed by DBMS_PARALLEL_EXECUTE. All these subprograms and any others in DBMS_PARALLEL_EXECUTE—except for TASK_STATUS—perform a commit. Let’s look at a few examples of how to use this package. We will start with the simplest approach: chunking by ROWID. Suppose I need to apply a raise in salary (specified by a percentage) to all the employees in our company. In case my parallelized task fails for some reason, I want to be able to retry a specified number of times to complete it. Listing 1 displays the code for implementing a parallelized apply_raise procedure. Code Listing 1: Chunking by ROWID in the apply_raise procedure
SQL> PROCEDURE apply_raise ( 2 pct_in IN NUMBER 3 , retries_in IN PLS_INTEGER DEFAULT 2 4 ) 5 IS 6 c_update_statement CONSTANT VARCHAR2 (1000) 7 := 'UPDATE /*+ ROWID (dda) */ EMPLOYEES emp 8 SET emp.salary = emp.salary * (1.0 + pct_in/100) 9 WHERE ROWID BETWEEN :starting_rowid AND :ending_rowid'; 10 c_task_name CONSTANT VARCHAR2 (20) := 'Give Raise'; 11 l_attempts PLS_INTEGER := 1; 12 BEGIN 13 DBMS_PARALLEL_EXECUTE.CREATE_TASK (c_task_name); 14 15 DBMS_PARALLEL_EXECUTE. 16 CREATE_CHUNKS_BY_ROWID (task_name => c_task_name 17 , table_owner => USER 18 , table_name => 'EMPLOYEES' 19 , by_row => TRUE 20 , chunk_size => 1000 21 ); 22 23 DBMS_PARALLEL_EXECUTE. 24 RUN_TASK (task_name => c_task_name 25 , sql_stmt => c_update_statement 26 , language_flag => DBMS_SQL.native 27 , parallel_level => 10 28 ); 29 30 LOOP 31 EXIT WHEN DBMS_PARALLEL_EXECUTE.TASK_STATUS (c_task_name) = 32 DBMS_PARALLEL_EXECUTE.FINISHED 33 OR l_attempts > retries_in; 34 l_attempts := l_attempts + 1; 35 DBMS_PARALLEL_EXECUTE.RESUME_TASK (c_task_name); 36 END LOOP; 37 38 DBMS_PARALLEL_EXECUTE.DROP_TASK (c_task_name); 39* END apply_raise; Table 1 explains the use of DBMS_PARALLEL_EXECUTE subprograms in the apply_raise procedure in Listing 1. The steps in lines 6 through 28 create the task, specify chunking, and run the task. If you are sure the task will complete or if you do not want to recover from any failures, you can simply proceed to line 38 and drop the task.
Given that any number of errors can occur during execution of DML statements, however, you may want to build into your parallel task execution the ability to check the status of the task and resume it if a failure has occurred. Lines 30 through 35 in the apply_raise procedure address that challenge. As you can see, DBMS_PARALLEL_EXECUTE offers an elegant, high-level API for specifying the parallel execution of a DML statement. Chunking by User SQL StatementAs an alternative to chunking by ROWID, you can provide your own query to specify how you want your data chunked and modified in parallel. The query must contain two columns—start_id and end_id—both of which must be ROWIDs or numbers. Each row retrieved from this query against the “chunking table” must specify the start and end values of the chunk.
Suppose I want to execute my update against employees by ranges of department IDs. I can create a chunking table (see Listing 2) and then modify my apply_raise procedure. To do this, first I change the WHERE clause of the update statement to specify ranges of department IDs:
c_update_statement CONSTANT VARCHAR2 (1000) := 'UPDATE EMPLOYEES emp SET emp.salary = emp.salary * (1.0 + pct_in/100) WHERE department_id BETWEEN :starting_deptid AND :ending_deptid' ; Then I define the query used to specify the chunking rows:
c_chunk_statement CONSTANT VARCHAR2 (1000) := 'SELECT start_id, end_id FROM department_chunks'; Finally, I create chunks by this SQL statement, specifying that ROWIDs are not used:
DBMS_PARALLEL_EXECUTE. CREATE_CHUNKS_BY_SQL ( task_name => c_task_name , sql_stmt => c_chunk_statement , by_rowid => FALSE ); The remainder of the apply_raise procedure remains unchanged. Code Listing 2: Creating a chunking table
CREATE TABLE department_chunks (start_id INTEGER, end_id INTEGER) / BEGIN INSERT INTO department_chunks VALUES (1, 500); INSERT INTO department_chunks VALUES (501, 1000); INSERT INTO department_chunks VALUES (1001, 1500); COMMIT; END; / Chunking by Numeric ColumnIf the table I am changing contains a numeric column whose values can be used to define the ranges of rows to be updated in parallel, I can specify chunking with this procedure:
DBMS_PARALLEL_EXECUTE.CREATE_ CHUNKS_BY_NUMBER_COL ( task_name IN VARCHAR2, table_owner IN VARCHAR2, table_name IN VARCHAR2, table_column IN VARCHAR2, chunk_size IN NUMBER); In other words, I simply need to provide the name of the numeric column. Oracle Database 11g Release 2 then computes the MIN and MAX values of the specified column and divides the ranges evenly, as specified by the chunk size. To implement this approach, I would change the original apply_raise procedure. First, I change the WHERE clause of the update statement to specify ranges of department IDs:
c_update_statement CONSTANT VARCHAR2 (1000) := 'UPDATE EMPLOYEES emp SET emp.salary = emp.salary * (1.0 + pct_in/100) WHERE department_id BETWEEN :starting_deptid AND :ending_deptid' ; Next, I create chunks by the department_id column:
DBMS_PARALLEL_EXECUTE. CREATE_CHUNKS_BY_NUMBER_COL ( task_name => c_task_name , table_owner => USER , table_name => 'EMPLOYEES' , table_column => 'DEPARTMENT_ID' , chunk_size => 1000 ); The remainder of the apply_raise procedure remains unchanged. DBMS_PARALLEL_EXECUTE provides many more subprograms and supports many more features than can be covered in this column. For example, rather than simply asking to run a task with RUN_TASK, you can control chunk execution by getting a specific chunk with GET_ROWID_CHUNK and then executing it with EXECUTE IMMEDIATE. You can then immediately resolve any errors and decide if you want to commit the changes. This introduction to DBMS_PARALLEL_EXECUTE should make it clear that with Oracle Database 11g Release 2, it is much easier to parallelize execution of DML operations.
Steven Feuerstein (steven.feuerstein@quest.com) is Quest Software’s PL/SQL evangelist. He has published 10 books on Oracle’s programming language, including Oracle PL/SQL Programming and Oracle PL/SQL Best Practices (O’Reilly Media). Feuerstein’s self-appointed mission in life these days is to improve the quality and quantity of PL/SQL code testing. |
