By Steven Feuerstein
Follow self-managing databases with self-managing PL/SQL.
Oracle has introduced significant management automation features in its flagship database product and has described the new Oracle Database 10g release as "the self-managing database." Given that achievement and direction, I thought it would make sense to talk about how to write one's PL/SQL application so that it is also self-managing.
What, you might ask, would it mean for an application to be self-managing? Let's dream for a moment: I have built an application composed of 50 tables and 200 programs. It is running in production, but the users have a request for an enhancement. This enhancement requires a new column in table A and changes in 6 of the 200 programs. Yet 55 of the 200 programs reference table A.
When I add the column to table A, all 55 programs are marked INVALID, marking, in turn, another 72 programs INVALID. I make the necessary changes to those 6 programs, testing them thoroughly. I then recompile any invalid programs, at which time they automatically adapt to the changes in my data structures and compile cleanly. My application is ready once again for production.
Doesn't that sound very pleasant? Unfortunately, most of us would not be able to enjoy such an experience. Rather, we write our code so that the slightest change in data structures wreaks havoc in our source, not only marking excessive numbers of programs INVALID but also requiring extensive and ultimately unnecessary modifications before the application is running again.
This article offers several suggestions for ways to write your code so that your application is, as much as possible, self-managing, requiring the absolute minimum amount of attention and adjustment to keep it running, even in the face of data structure modifications.
Use Anchored Declarations and SUBTYPEs
Consider the following code:
1 CREATE OR REPLACE PROCEDURE process_employee ( 2 employee_id_in IN number) 3 IS 4 l_fullname VARCHAR2(200); 5 BEGIN 6 SELECT last_name || ',' || 7 first_name 8 INTO l_fullname 9 FROM employee 10 WHERE employee_id = employee_id_in; 11 12 ... /*other processing*/ ... 13 END;
I pass in the primary key value ( employee_id ) and use it to retrieve the full name of an employee in the format "last comma first." The way I have written this code makes it very vulnerable, unfortunately, to relatively minor changes in the underlying employee table. Specifically, I have hard-coded the datatypes of my parameter (line 2) and variable (line 4). Let's explore the impact of this rather lazy approach to writing code.
Suppose the DBA changes the datatype of the primary key from NUMBER to VARCHAR2 and starts using alphanumeric values. The process_employee program will be marked INVALID, because it queries the employee table. It will recompile cleanly, without a programmer having to make any modifications. Yet I will not be able to call this procedure with an employee ID that contains a letter, because the datatype has been hard-coded to NUMBER .
A much better way to write this procedure header is
1 CREATE OR REPLACE PROCEDURE process_employee ( 2 employee_id_in IN employee.employee_id%TYPE)
I have now "anchored" the datatype of the parameter directly to that of the table's column. Every time my procedure is compiled, PL/SQL will look up the datatype of the column from the data dictionary and use that in the compiled code. So if the DBA does make the column alphanumeric, no harm will befall this code.
The declaration on line 4, unfortunately, cannot be resolved so easily—and it is much more likely to cause problems in my application. Here is the thought process that results in such a declaration: I know that I should anchor my declaration of l_fullname so that as the lengths of my names grow, my variable will automatically adjust. Yet what can I use as an anchor? The following declaration avoids the hard-coding but does not avoid the problem of a VALUE_ERROR exception if the combination of first and last names exceeds the length of a last name:
The problem here is that the full name is a derived value. There simply isn't a column I can use in an anchored declaration. For such situations, however, Oracle offers a fine solution: create and use a SUBTYPE .
A SUBTYPE is an alias for another already defined datatype. In the following "employee rules" ( employee_rp ) package specification, for example, I define a new datatype called fullname_t , which is really nothing more than another name for VARCHAR2(1000):
1 CREATE OR REPLACE PACKAGE employee_rp 2 AS 3 SUBTYPE fullname_t IS VARCHAR2 (1000); 4 END employee_rp;
With this SUBTYPE in place, I can adjust my declaration to
I have now removed the hard-coding from my process_employee procedure's declaration. Does this allow my code base to manage itself? Not completely, but it's much better than before. If all developers take advantage of predefined, application-specific types such as fullname_t whenever they declare a derived value, they will reduce the points of failure to one place in their code—in this case, the employee_rp package.
If there comes a time when 1,000 characters isn't enough to hold last and first names, I change just this single line of code in employee_rp and recompile my code base, and everything is up and running with the current, valid definition.
We PL/SQL developers generally live in the moment. We write code today to satisfy today's requirements. We don't give much thought to what our code might be doing or changing to do in the future. We also write an awful lot of queries in our code. The intersection of "coding for the present" and querying data can lead to significant problems in our code in the future. Let's look at an example.
I am building an application to keep track of employment levels throughout the Americas, so that a government agency can analyze the impact of the North American Free Trade Agreement (NAFTA). In order to make it easier to optimize and maintain the code, I have hidden all SELECT statements against the employee_level table in the employee_ level_qp package ("QP" for Query Package) and made the result sets available through REF CURSOR s.
Here is a demonstration of this technique. The employee_level table is defined as follows:
1 CREATE TABLE employee_level ( 2 employer_id INTEGER 3 ,employee_count INTEGER 4 ,measurement_date DATE 5 );
First, I define the package specification, which includes a REF CURSOR type declaration and the header of a function that returns all columns for all the rows in the table, as shown in Listing 1.
I then implement the function, using the OPEN FOR statement, as shown in Listing 2.
With this package in place, I can query data from the employee_level table. The analyze_data procedure in Listing 3 illustrates the kind of code developers often write.
I have already shown the problems with hard-coded declarations like those found in lines 3 through 6. Line 10, however, presents new challenges. When I first wrote this code, the employee_level table had just three columns, and this code worked just fine. Three months after the application went into production, however, the DBA added a fourth column: examiner_id , a foreign key to the examiner table, so that she could keep track of who examined the employer to determine the employment level.
With the addition of the new column, the analyze_data program would be marked INVALID—and any attempts to recompile this program would fail. The INTO clause of the FETCH on line 10 must now contain four variables, not three. Unfortunately, the number of elements is hard-coded in the very structure of the statement; it will not adjust to match the new structure of the table.
There is, fortunately, a very simple solution to this problem: Never fetch into a list of individual variables. Instead, always fetch into a record structure, as you see in the new implementation of analyze_data shown in Listing 4.
Now, whenever the table is changed and analyze_data is recompiled, the l_one_level record will have a field for each column in the table and the FETCH statement will be valid. No programmer intervention needed.
The DUAL table is an odd little structure. Residing in the SYS schema, this table is used by PL/SQL developers to access SQL-specific functionality that is not natively available in PL/SQL.
The most common application of the DUAL table is to generate and retrieve the next value in a sequence, usually for the primary key of a table, as in the following:
1 SELECT employee_seq.NEXTVAL 2 INTO :employee.employee_id 3 FROM DUAL;
Oracle does not allow us to reference employee_seq.NEXTVAL natively in PL/SQL code; it must be called from within a SELECT statement. Why do we use the DUAL table for this activity? Because it has—or should have—just a single row in it. The NEXTVAL operator thus returns the next (and only one) value from the sequence. Unfortunately, it is not uncommon to hear from attendees at my seminars about situations in which the DUAL table has more than one row. In such a case, my preceding query would fail with a TOO_MANY_ROWS error. Even worse, certain built-in functions in PL/SQL, such as USER , would also fail, as is clear from this code taken from the stdbody.sql file delivered with Oracle Database:
1 FUNCTION USER 2 RETURN VARCHAR2 3 IS 4 c VARCHAR2 (255); 5 BEGIN 6 SELECT USER 7 INTO c 8 FROM SYS.DUAL; 9 10 RETURN c; 11 END;
It is hard to have a self-managing code base if (a) you rely on the DUAL table to generate sequence values and (b) you don't safeguard against having more than one row in the DUAL table. You could, of course, define a trigger on the DUAL table to reject any attempts to put more than one row into DUAL . User-defined triggers on SYS -owned tables are, however, a no-no. So what are developers to do if they want to avoid the DUAL table? Create and use a one-row table of their own, with the same structure as DUAL .
Listing 5 shows the code to both create such a table and also define a trigger on that table to insist that there can be only one row.
With onerow defined, I can now get my next primary key value as follows:
1 SELECT employee_seq.NEXTVAL 2 INTO :employee.employee_id 3 FROM onerow;
Because I have much more control over the contents of onerow , this is a better solution than querying from DUAL . Still, this is less than ideal. What if a future version of Oracle Database allows direct reference of <sequence>.NEXTVAL in PL/SQL? I will be stuck with all of this now-outmoded code that I want to replace with the much simpler (and more efficient)
:employee.employee_id := employee_seq.NEXTVAL;
So as a final step toward making my application resistant to breakage and easily upgradable over time, I hide my query behind a "generic" function that utilizes native dynamic SQL to generically retrieve the next value of any sequence, as shown in Listing 6.
Now I can obtain my primary key value with nothing more than this:
:employee.employee_id := next_pky ('employee_seq');
And when Oracle Database removes its restriction on referencing NEXTVAL , I change the implementation in this one function; after recompilation all of my code is upgraded to the latest capabilities of the Oracle Database. Although this step does not lead to 100-percent self-management, it certainly helps move your application base to a "minimal management" profile.
You can even take advantage of native dynamic SQL to combine all those steps into a single procedure that will set up a replacement for the DUAL table in your schema. For more information about this approach, check out the replace_onerow.sp file in the code download for this article.
READ about Oracle Database 10g
Most PL/SQL developers are aware of a common problem with a code base of any size: the need to recompile large percentages of that code base when programs are marked INVALID by data structure changes. A self-managing application should, at a minimum, be able to keep its own code base in a VALID state.
Oracle provides a utility, DBMS_ UTILITY.COMPILE_SCHEMA , that compiles procedures, functions, packages, and triggers in the specified schema. This procedure identifies all the invalid program units and recompiles them in alphabetical order.
To complement DBMS_UTILITY.COMPILE _SCHEMA , Solomon Yakobson has written his own recompilation utility, recompile_ invalid_objects , that uses CONNECT BY-START WITH to walk the dependency tree of database objects, recompiling from the "bottom up." (You can download this utility with the code download for this article.) After you run the recompile_ invalid_objects procedure, all programs in the specified schema will be VALID.
The recompile_invalid_objects utility is handy, but it doesn't run itself, which is what we need for a self-managing application. So take advantage of DBMS_SCHEDULER (or DBMS_JOB , prior to Oracle 10g) to create a job ensuring that all programs in a schema are VALID. The code in Listing 7 sets up a job to run recompile_invalid_objects every 30 minutes.
It would be a very strange application in which changes to the underlying data structures do not require at least some changes in code. Thus, a 100-percent self-managing application is a bit of an unachievable ideal.
It is quite feasible, on the other hand, to write your code so that as data structures change, you can isolate and minimize the need for manual code modifications. The techniques covered in this article—anchor declarations, fetch into records, avoid DUAL —help you move toward the ideal.