DEVELOPER: PL/SQL Practices
On the PGA and Indexing CollectionsBy Steven Feuerstein
Best practices for knowing your PGA impact and indexing collections
My DBA wants me to reduce the amount of PGA (program global area) memory I use in my collection-based programs. Isn't it the DBA's job to manage memory, and if it isn't, how am I supposed to know how much PGA memory I am using?
Sure, it is ultimately the responsibility of the database administrator to analyze memory usage and to either make the necessary adjustments in the database configuration or recommend to the developers what should be changed in their code.
Having said that, I strongly encourage all developers to become as aware of and as involved as possible in all aspects of deployment of your code, to ensure the best-possible results for your users.
It is certainly possible, for example, for you to measure the amount of PGA memory your program uses. To do so, however, you will need to go back to your DBA and ask that person to grant SELECT privileges to your schema on the following views:
SYS.V_$SESSION SYS.V_$SESSTAT SYS.V_$STATNAME
Once you can query from these views, compile the show_pga_memory procedure, shown in Listing 1 (and included in the show_pga_memory.sp file in the sample code download for this column), in your schema. You can then call show_pga_memory after running your application code to see how large the PGA has become.
Code Listing 1: show_pga_memory procedure
CREATE OR REPLACE PROCEDURE show_pga_memory ( context_in IN VARCHAR2 DEFAULT NULL ) IS l_memory NUMBER; BEGIN SELECT st.VALUE INTO l_memory FROM SYS.v_$session se, SYS.v_$sesstat st, SYS.v_$statname nm WHERE se.audsid = USERENV ('SESSIONID') AND st.statistic# = nm.statistic# AND se.SID = st.SID AND nm.NAME = 'session pga memory'; DBMS_OUTPUT.put_line ( CASE WHEN context_in IS NULL THEN NULL ELSE context_in || ' - ' END || 'PGA memory used in session = ' || TO_CHAR (l_memory) ); END show_pga_memory;
When I used this procedure to test BULK COLLECT with different LIMIT clause values, I found that the size of my PGA ranged from 4,613,072 bytes to 46,883,792 bytes.
I want to use associative arrays to quickly look up an office product name for a given product number and a product number for a given product name. Product names are unique, and product numbers are integers. I see how I can use the product number as the index value in my collection of names, but I also need to reverse the process and find a product number for a given product name. Can I create another index on a collection's contents?
A collection has only one index, which is defined explicitly in associative array types with the INDEX BY clause (as integer or string) and implicitly in nested table and varray types (as integer only).
So if you need to find the index value (location) at which a given element resides, you will have to do one of the following:
I demonstrate both of these techniques below, based on a set of office product data that is initialized within the package body. Scan until you find it. First, I create a package whose specification includes a collection of product names, as shown in Listing 2.
Code Listing 2: Initial office_products package specification and body
CREATE OR REPLACE PACKAGE office_products IS TYPE names_list_aat IS TABLE OF VARCHAR2 (100) INDEX BY PLS_INTEGER; list_of_names names_list_aat; END office_products; CREATE OR REPLACE PACKAGE BODY office_products IS BEGIN list_of_names (1567) := 'Stapler'; list_of_names (75009) := 'Mousepad'; list_of_names (101) := 'Pencil sharpener'; END office_products;
Suppose I have the name of the product and want to find its index value so that I can remove that element. Listing 3 contains the product_lookup1 function, which scans the collection until it finds a match on the name.
Code Listing 3: Function for finding name of product
CREATE OR REPLACE FUNCTION product_lookup1 (NAME_IN IN VARCHAR2) RETURN PLS_INTEGER IS c_count PLS_INTEGER := office_products.list_of_names.COUNT; l_index PLS_INTEGER := office_products.list_of_names.FIRST; l_return PLS_INTEGER; BEGIN WHILE (l_index IS NOT NULL) LOOP IF office_products.list_of_names (l_index) = NAME_IN THEN l_return := l_index; l_index := NULL; ELSE l_index := office_products.list_of_names.NEXT (l_index); END IF; END LOOP; RETURN l_return; END product_lookup1;
This is a very simple program, easy to understand and maintain. Yet it may perform inefficiently for large collectionsespecially if I try to locate the index value for a name that is not in the list. To do so, I would have to scan the full contents of the collection. Clearly, I'd rather not have to look through the collection element by element.
Emulating a table column index. A much better approach is to follow the example of Oracle Database. If you want to quickly find a row in a table based on a given column value, you create an index on that column. For this question, if I were working with a relational table, I would create an index on the product name.
I can do something similar for a collection, but to do so, I will have to define a second collection to emulate the index on a table column. For the remainder of this column, I will use the term index collection when referring to the collection I create to emulate an index on a table column.
A collection can have only one index, and my list_of_names collection is already indexed by product number. So I will have to create a second collection and index this one by product name. Listing 4 shows the modified office_products package specification.
Code Listing 4: Modified office_products package specification
CREATE OR REPLACE PACKAGE office_products IS SUBTYPE name_t IS VARCHAR2(100); SUBTYPE index_t IS PLS_INTEGER; TYPE names_list_aat IS TABLE OF name_t INDEX BY index_t; TYPE index_list_aat IS TABLE OF index_t INDEX BY name_t; list_of_names names_list_aat; index_by_name index_list_aat; END office_products;
Note, first of all, that I no longer use hard-coded declarations for the types referenced in my collection types. Instead, I have switched to using subtypes.
This approach is completely optional but highly recommended. In other words, I could define my collection types as follows:
TYPE names_list_aat IS TABLE OF VARCHAR2(100) INDEX BY PLS_INTEGER; TYPE index_list_aat IS TABLE OF PLS_INTEGER INDEX BY VARCHAR2(100);
Both approaches will execute in exactly the same way. So why bother with subtypes? Because the resulting code explains itself much more clearly, making it easier to understand, debug, and maintain.
Because I'm using subtypes, my package specification tells mevery explicitly and without the need for commentsthat I have these two collections to work with:
Of course, I still need to populate these collections, so let's look at the package body in Listing 5.
Code Listing 5: Modified office_products package body
CREATE OR REPLACE PACKAGE BODY office_products IS PROCEDURE initialize IS PROCEDURE add_product (index_in IN index_t, NAME_IN IN name_t) IS BEGIN list_of_names (index_in) := NAME_IN; index_by_name (NAME_IN) := index_in; END add_product; BEGIN add_product (1567, 'Stapler'); add_product (75009, 'Mousepad'); add_product (101, 'Pencil sharpener'); END initialize; BEGIN initialize; END office_products;
I have, first of all, changed the structure of the body, having moved the lines of code used to populate the names collection into its own initialize procedure. I then call that procedure in the initialization section of the package. I find it easier to maintain such code in its subprogram than to have a long, complex block of code at the end of my package. Also, I could now, if necessary, expose this subprogram in the package specification so that it can be reinitialized as needed during a session.
The initialize procedure itself contains a local procedure, add_product, that takes a name and does two things:
I then call add_product for each row in the products table.
With this emulated index in place, my lookup function now becomes very simple, as shown in Listing 6. This function returns the index value for the specific name, if a row exists for that name. If not, the function simply returns NULL. Because NULL is never allowed as an index value, it is a good indicator for "Sorry, no product with that name."
Code Listing 6: Modified product lookup (product_lookup2) function
CREATE OR REPLACE FUNCTION product_lookup2 (NAME_IN IN office_products.name_t) RETURN PLS_INTEGER IS l_index PLS_INTEGER; BEGIN IF office_products.index_by_name.EXISTS (NAME_IN) THEN l_index := office_products.index_by_name (NAME_IN); END IF; RETURN l_index; END product_lookup2;
This second approach, taking advantage of the emulated index, results in a very simple lookup program, but, of course, the setup code is more complicated. Increased code complexity is justified if I can get a substantial boost in performance. Let's see if that is the case.
Comparing the performance of lookup programs. I want to compare performance for these scenarios:
First I change the initialize procedure so that I can load a nontrivial volume of data into the collections (I will no longer populate the collections in the initialization section), as shown in Listing 7.
Code Listing 7: Modified initialize procedure
CREATE OR REPLACE PACKAGE BODY office_products IS PROCEDURE initialize (count_in IN PLS_INTEGER) IS PROCEDURE add_product (index_in IN index_t, NAME_IN IN name_t) IS BEGIN list_of_names (index_in) := NAME_IN; index_by_name (NAME_IN) := index_in; END add_product; BEGIN list_of_names.DELETE; index_by_name.DELETE; add_product (1567, 'Stapler'); add_product (75009, 'Mousepad'); add_product (101, 'Pencil sharpener'); FOR indx IN 1 .. count_in LOOP add_product (indx * 1000, 'Product ' || indx); END LOOP; END initialize; END office_products;
I then construct a compare_lookups procedure that relies on DBMS_UTILITY.GET_CPU_TIME to help me calculate elapsed CPU time down to the hundredth of a second. Listing 8 includes the code for timing the performance of the collection scan implementation.
Code Listing 8: compare_lookups procedure
CREATE OR REPLACE PROCEDURE compare_lookups ( coll_count_in IN PLS_INTEGER , iterations_in IN PLS_INTEGER ) IS l_time_before PLS_INTEGER; l_index PLS_INTEGER; BEGIN office_products.initialize (coll_count_in); . . . l_time_before := DBMS_UTILITY.get_cpu_time; FOR indx IN 1 .. iterations_in LOOP l_index := product_lookup1 ( 'Product ' || TO_CHAR (coll_count_in / 2)); END LOOP; DBMS_OUTPUT.put_line ( ' Scan Find - name exists = ' || TO_CHAR (DBMS_UTILITY.get_cpu_time - l_time_before)); . . . END compare_lookups;
I then call this compare_lookups procedure with variations in the number of rows in the collection and the number of calls to the lookup programs:
BEGIN compare_lookups (100, 1000); compare_lookups (1000, 1000); compare_lookups (100000, 1000); compare_lookups (100000, 10000); END;
Listing 9 includes the results I obtained (in hundredths of a second) in an Oracle Database 10g Release 2 instance on a laptop with 4GB of memory.
Code Listing 9: Results of compare_lookups procedure
COUNT in collections = 100 Number of times program lookup called = 1000 Scan Find - name exists = 0 Scan Find - name does not exist = 2 Index Find - name exists = 0 Index Find - name does not exist = 0 COUNT in collections = 1000 Number of times program lookup called = 1000 Scan Find - name exists = 12 Scan Find - name does not exist = 11 Index Find - name exists = 2 Index Find - name does not exist = 0 COUNT in collections = 100000 Number of times program lookup called = 1000 Scan Find - name exists = 681 Scan Find - name does not exist = 1391 Index Find - name exists = 0 Index Find - name does not exist = 0 COUNT in collections = 100000 Number of times program lookup called = 10000 Scan Find - name exists = 6578 Scan Find - name does not exist = 14043 Index Find - name exists = 1 Index Find - name does not exist = 0
From this data, I draw the following conclusions:
Finally, as with any performance analysis and conclusions, you should run the script compare_lookups.sql (available in the sample code download for this column) yourself on your version of Oracle Database and confirm the results. Don't forget. . . . Collections offer a very efficient mechanism for looking up data. Retrieving data from a collection cached in the program global area is much faster than doing so from a persistent table's data cached in the system global area.
You must always remember the following, however, when using package-level collections in this way:
If you are developing your software for Oracle Database 11g, you should take advantage of the function result cache, which I wrote about in the September/October issue of Oracle Magazine ("On the PL/SQL Function Result Cache,"). It offers performance similar to that of package-based caches but is shared across all sessions in the database instance and is automatically invalidated if any changes are made to dependent tables.
Regardless of the approach you take, remember that collections have only a single index but that you can always add your own emulated indexes by populating other collections with the cross-referencing information.
Steven Feuerstein (firstname.lastname@example.org) is Quest Software's PL/SQL evangelist. He has published 10 books on Oracle's programming language, including Oracle PL/SQL Programming and Oracle PL/SQL Best Practices (O'Reilly Media). Feuerstein's self-appointed mission in life these days is to improve the quality and quantity of PL/SQL code testing.