By Steven Feuerstein
Now you can index by strings and improve performance.
Many PL/SQL programs manipulate data, often by using SQL to work with that data directly in the database. Yet you often need to declare and manage data within PL/SQL programs themselves. This program data might consist of individual values (scalars). In many other situations, however, you deal with much more complex data structures, from records to objects (instances of object types) to lists.
Lists (and, in their more complex manifestation, arrays) are crucial elements in a programmer's toolbox. To create and manage these lists, PL/SQL offers a variety of structures—called collections: nested tables, varying arrays, and associative arrays.
Associative arrays were introduced in Oracle9i Release 2 to replace index-by tables (which replaced PL/SQL tables in Oracle8). Associative arrays introduce significant new functionality. In Oracle9i Release 2, you can now:
Define collections that contain other collections within them. Called multilevel collections, these lists allow you to more intuitively and directly model real-world scenarios within PL/SQL data structures. I covered this feature in my May/June 2002 Oracle Magazine article, " Programming at Multiple Levels."
Create collections whose rows are indexed (and quickly retrievable) by strings. The name of a company can, for example, be the "row number" in the collection.
This article explores how to index associative arrays by strings and describes how you define such collections.
There is a new way to define and manipulate PL/SQL-specific collections. Specifically, you can index by strings, in addition to the integer values of the row numbers. This provides significant additional flexibility. Consider the example in Listing 1.
In Listing 1, line 2 declares an associative array type that stores numbers, indexed by strings and with a maximum length of 64 characters. Lines 3 and 4 declare two associative arrays based on this type. Line 5 declares a variable to store the number of rows in a collection. Line 6 declares a variable that holds the "limits" (lowest and highest row index values). Notice that this variable is declared as a string, since it will be receiving values for a string-based index. In lines 8 and 9, I assign values to the country_population collection. In each case, I assign the population of a country to a row in the collection. The index to that row is the name of the country. In line 11, I retrieve a value from the collection. Notice that I specify the row with the name of the country, and the script returns the population. In lines 18 through 20, I obtain the "first" defined row value, then display it, and then display the population for that country. How does the database determine the "first" row? The first row is determined according to the sort order specified by the character set in the database. In lines 22 through 24, I do the same process for the "last" defined row value as for the first defined row value.
It may take some getting used to, but with string-indexed collections, the values returned by a call to the FIRST, LAST, PRIOR , and NEXT methods are strings, not numbers.
So why would you want to index by string instead of by number? Suppose that you need to do some heavy processing of employee information in your program. You need to go back and forth over the set of selected employees, for example, searching by the employee ID number, the last name, and the social security number (or appropriate national identification number, for non-U.S. countries).
You can certainly use SQL to do all of that, but it may be far from the most efficient implementation. If you need to do multiple passes over a significant set of static data, you can instead move it from the database into a set of collections. Accessing collection-based data is much faster than going through the SQL engine.
You can then take advantage of string- and integer-based indexing on those collections to, in essence, mimic the primary key and unique indexes on the table (whose data you have transferred from the database). A simple example of this approach is shown in the code in Listing 2.
In Listing 2 , lines 1 through 4 declare two types of associative arrays. Notice that I am able to take advantage of %TYPE in one INDEX BY clause, and use PLS_INTEGER instead of BINARY_INTEGER in the other. These are both new capabilities in Oracle9i Release 2. In lines 6 through 8, I declare the collections I will use to provide multiple, fast entry points into the data. In lines 12 through 21, the load_arrays procedure shows how easy it is to transfer data from a database table to one or more collections—in this case, three. I deposit the entire row of employee data as a record into each collection, using different column values as keys. In line 17, I use the last name as the index value. In line 18, I use the social security number as the index value. In line 19, the primary key serves as the index (and is a fairly traditional application of the integer index as an "intelligent key"). In line 27, I perform a comparison of the salary fields in two different collections, using both string and integer index values.
I have been constructing a new and interesting utility that I call OverloadCheck, which takes advantage of Oracle9i Release 2 collection enhancements. This package sifts through the contents of the ALL_ARGUMENTS data dictionary view in order to analyze a specified package or standalone program for conformance to coding standards or to uncover some design problems, such as ambiguous overloadings.
ALL_ARGUMENTS contains information about each parameter or argument of each procedure or function that is stored in the database (and on which the connected user has EXECUTE authority). A single row in ALL_ARGUMENTS contains information about one argument (or, in some cases, one field or element of an argument). Here is a subset of the columns defined for ALL_ARGUMENTS :
Name Null? Type --------------- -------- ------------ OWNER NOT NULL VARCHAR2(30) OBJECT_NAME VARCHAR2(30) PACKAGE_NAME VARCHAR2(30) OVERLOAD VARCHAR2(40) ARGUMENT_NAME VARCHAR2(30) POSITION NOT NULL NUMBER DATA_LEVEL NOT NULL NUMBER DATA_TYPE VARCHAR2(30) DEFAULT_VALUE LONG IN_OUT VARCHAR2(9)
You can quickly see the hierarchy embedded in these rows: each object ( OWNER.OBJECT_NAME ) may be overloaded ( OVERLOAD is NULL if not overloaded or a positive number). Each argument within an overloading has a position, and within that position you can have multiple "levels" of argument information. In other words,
OBJECT_NAME OVERLOAD ARGUMENT_NAME POSITION LEVEL
In the space of this article, it isn't possible to present much of the implementation of the utilities in the OverloadCheck package. I would like to share with you, however, the code I wrote to transfer the data from ALL_ARGUMENTS to my collections, to make it easy for me to quickly analyze its contents.
In my first pass at the OverloadCheck package, I defined a collection type and collection that simply mapped to the data dictionary view as follows:
CREATE OR REPLACE PACKAGE BODY OverloadCheck IS TYPE args_t IS TABLE OF all_arguments%ROWTYPE INDEX BY BINARY_INTEGER; arguments args_t;
And with that collection defined, it takes a very small amount of code to grab the data for a particular program and stick it into my collection, as shown in Listing 3 .
I realized a few moments after writing the code in Listing 3, however, that I was thinking in outmoded ways. My plan was to fill up my collection and then write some complex code to scan through the arguments collection and deduce many things, such as:
The number and names of the unique programs in a package. I might have a total of 12 procedures but only 3 different program names, due to overloading.
The top-level argument entries (with DATA_LEVEL = 0 ).
Where in the collection the arguments for one program ended and another started.
The burden of figuring out the organization of data within ALL_ARGUMENTS and writing the required code was mine.
What if I define multiple, nested collections to store this data? Perhaps by taking this approach, the very organization of the collections would allow me to answer some of my questions much more naturally. After a fair amount of thinking and testing of alternatives, I came up with the hierarchy of collection types shown in Listing 4.
|FIGURE 1: The four nesting levels in the ALL_ARGUMENTS collection mapping|
Here is a description of what is going on in Listing 4. It is best in cases like this to work from the "bottom up," or from the outer layer of the hierarchy to the innermost collection. In line 15, I declare a collection of type programs_t . This single collection will contain all the information from ALL_ARGUMENTS , just as in the first attempt, but the organization of that information will be quite different. In lines 12 and 13, I declare collection type programs_t . Each row in a collection of this type contains all the information about the arguments of all the overloadings for a given program name. Notice that the index is the name of the object. In lines 9 and 10, I declare collection type overloadings_t . Each row in a collection of this type contains all the information about the arguments for a single overloading of a program. Now I am back to using an integer index, because the key is the OVERLOAD column value from ALL_ARGUMENTS . In lines 6 and 7, I declare collection type arguments_t . Each row in a collection of this type contains all the information about a single argument or parameter for a specific overloading. Again, I use an integer index, because the key in this case is the POSITION column value from ALL_ARGUMENTS (the position in the parameter list). In lines 3 and 4, I declare collection type breakouts_t . Each row in a collection of this type contains all the information about a single element of an argument. This could be a single row of data or many rows. If the argument in question is, for example, a record with 15 fields, then this collection would have at least 15 rows. I rely on an integer index, because the key in this case is the LEVEL column value from ALL_ARGUMENTS (0 indicates the actual argument in the parameter list).
Confused? First of all, don't get caught up in the complexities of the data in ALL_ARGUMENTS . I just wanted to give you a sense of the variations that I needed to handle in my code. Next, consider the diagram in Figure 1. Notice how the embedded hierarchy in the ALL_ARGUMENTS view is naturally expressed in these different levels.
With the g_programs collection in place, let's revisit the load_arguments procedure in the OverloadCheck package, as shown in Listing 5.
What has changed between the load_arguments procedure in Listing 3 and this version in Listing 5? Very little and a whole lot. The only statement that differs is in lines 12 through 16 of Listing 5. Here is the earlier, simple assignment in Listing 3:
arguments ( NVL (arguments.LAST, 0) + 1) := rec;
In Listing 5, it has become:
g_programs (rec.object_name) (rec.overload) (rec.position) (rec.data_level) := rec;
This is certainly more complex, but it is taking care of an awful lot of complexity for me. With this single assignment, for example, I am actually populating rows in all four collections, right down through the hierarchy. When I am done, it takes very little code to satisfy my retrieval needs. For example, to obtain the datatype of the RETURN clause of the second overloading of a function named TOTAL_SALES :
g_programs ('TOTAL_SALES') -- locate (2) -- second overloading (0) -- RETURN clause has position 0 (0) -- Top-level argument .data_type -- Value in the data_type field
In this situation, getting comfortable with this syntax is much less challenging than writing and debugging the logic needed to scan the contents of a "flat" (pre-Oracle9i Database) collection to extract such information.
While I don't expect you to immediately and fully comprehend the potential of these complex collection structures, I hope that the examples in this section have opened your eyes to the possibilities.
Used properly, collections help us write code that is both more efficient and easier to understand. The combination of multiple-level collections with string indexing can help reduce the complexity and volume of code needed to model and manipulate information from complex database designs.
Steven Feuerstein (firstname.lastname@example.org) is an authority on the PL/SQL language. Feuerstein is the author of six books on PL/SQL (all from O'Reilly & Associates), including Oracle PL/SQL Best Practices and Oracle PL/SQL Programming. He is a senior technology advisor for Quest Software.