Steven Feuerstein PL/SQL
Building a Code-Analysis Utility and Doing It Right the First Time
By Steven Feuerstein
OTN Member since 2001

A behind-the-scenes look at constructing a utility for validating code quality

Download overload_check.zip

One particular problem that nags at me is that the pressure of producing code under deadline often result in writing some, shall we say, suboptimal software. It may meet the minimum requirements. It may seem to work. But it is almost impossible to maintain, enhance, and debug.

The overarching problem many developers and development managers face is figuring out how to improve the quality of the code we write. I've decided to take on this challenge in a fairly unusual manner. Over the course of the next few months, I will publish a series of articles on OTN that shows how I created OverloadCheck, a PL/SQL package that relies on information in the data dictionary to perform a nontrivial task—analyze packages to see if they contain ambiguous overloadings. Rather than simply present the result, I invite you to take a journey with me as I employ top-down design techniques and many of the best practices I recommend in Oracle PL/SQL Best Practices.
The sidebar "Steps to Creating a Utility" is located at the end of this article.

Exploring this utility will take time, because I'm going to take this opportunity to go through several phases in the development life cycle. Over the next several months, I will do the following:

  • Define the problem I want OverloadCheck to solve and specify the requirements.
  • Come up with test cases to validate that my utility works.
  • Research technologies available to help me solve my problem.
  • Develop an overall design for the utility (and discover that testing requirements impact my design).
  • Follow stepwise refinement to build my solution in chunks of code that are easy to write, understand, and deploy.
  • Use the utPLSQL unit-testing framework to automate comprehensive regression testing for my utility.

(For more details on what each of the articles on building OverloadCheck will cover, see the sidebar, "Steps to Creating a Utility.")

By following along in the tale of the conception of OverloadCheck, you'll get exposure to some of Oracle PL/SQL's latest and greatest new features, such as multilevel collections. And I'll provide a platform of sorts that you can build on to add your own QA checks, such as checking for too many or too few parameters, looking for programs required in all packages, and making sure your code conforms to naming conventions. Perhaps the greatest benefit in coming along for the ride is that you'll get a chance to see best practices put to use, which is probably the easiest way to learn how to use them yourself.

I have a confession to make: I have already produced a working version of OverloadCheck. I'm improving it as I write the story of its construction. Consequently, the download for each of the articles in this series contains OverloadCheck (overload_check.zip). Feel free to put it to use immediately. If you encounter bugs or suggestions for improvements, send me a note at steven@stevenfeuerstein.com.

Defining the Problem: Overloading Ambiguities in Packages

Worrying about all aspects of writing high-quality software could drive a person mad. Rather than get overwhelmed and then abandon any initiative to begin creating OverloadCheck, I am going to focus on a very particular problem: It is quite possible to write and successfully compile a package in PL/SQL that contains programs you cannot invoke. That doesn't make much sense, does it? Let's take a look at how this odd situation can come about.

Oracle PL/SQL supports overloading, also known as static polymorphism. This means that you can define two or more programs with the same name in any declaration section or package as long as they differ sufficiently (usually in the parameter list) so that the compiler can tell which program you want to use. Overloading is a useful technique for improving the usability of your code. It can, however, present some challenges, especially if you have long parameter lists containing combinations of arguments with and without default values.

Just to convince you of the problem developers can face in this realm, consider a few examples of what can go wrong, starting with the following package specification:

CREATE OR REPLACE PACKAGE salespkg
IS

   PROCEDURE calc_total (zone_in IN VARCHAR2);

   PROCEDURE calc_total (reg_in IN VARCHAR2);
END salespkg; 
/

It compiles just fine, as does its body. I have two overloading programs, both named calc_total. One of them takes a zone, such as 'ZONE 15', and computes the total sales for that zone. The second accepts a region, such as 'SOUTHWEST', and computes the total sales for that region. Yet when I try to call one of those programs, I get an error:


SQL> exec salespkg.calc_total ('ZONE 15')
BEGIN salespkg.calc_total ('ZONE 15'); END;
      *
ERROR at line 1:
ORA-06550: line 1, column 7:
PLS-00307: too many declarations of 'CALC_TOTAL' match this call

The error message points out the problem specifically: "Too many declarations of CALC_TOTAL match this call." You see, computers aren't very smart. You and I can both see that 'ZONE 15' is a zone; shouldn't the PL/SQL compiler be able to recognize this as the "zone version" of calc_total (the overloading with the zone_in parameter)? Unfortunately, that's not the way it works. 'ZONE 15' is a string literal, not analyzed by the compiler. The compiler doesn't know which to use and throws up its hands in disgust.

How can I solve this problem? In this particular case, I could use named notation to get around the ambiguity:

BEGIN
   salespkg.calc_total (zone_in => 'ZONE 15');
END;

In this instance, I'm telling the compiler to use a specific parameter (zone_in). Because only one of the two overloadings has a parameter with this name, the compiler now knows which of the two to use. It's unacceptable, though, that I would have to use named notation in order to call a procedure or function. This is clearly bad design—and it gets worse from here. Consider this package specification:

CREATE OR REPLACE PACKAGE salespkg

IS
   PROCEDURE calc_total (zone_in IN VARCHAR2);

   PROCEDURE calc_total (zone_in IN CHAR);
END salespkg; 
/

This package will, once again, compile without any error. But now I face a different situation: It is entirely impossible to call either of these procedures. They share the same program name and parameter name. The only difference is the datatype. VARCHAR2 is certainly different from CHAR, so the compiler gives you its seal of approval. Unfortunately, these two datatypes are not different enough, which leads to difficulties when you actually try to use the code. Consider the following piece of code:

BEGIN
   salespkg.calc_total ('ZONE15');

END;

Is 'ZONE 15' fixed length or variable length? The PL/SQL documentation says that "All string literals except the null string ('') have datatype CHAR," but the compiler doesn't seem to be aware of this fact. Oddly enough, even if I pass the program an explicitly declared fixed-length string, I still get an error:

SQL> DECLARE
  2     l_zone CHAR(6) := 'ZONE15';
  3  BEGIN
  4     salespkg.calc_total (l_zone);
  5  END;
  6  /
   salespkg.calc_total (l_zone);
   *

ERROR at line 4:
PLS-00307: too many declarations of 'CALC_TOTAL' match this call

As you can see, it is indeed possible to define package overloadings that compile cleanly but that are either unusable or usable only via "unnatural acts," such as mandatory use of named notation.

With such a simple package, it's not hard to eyeball the code, find the problem, and fix it quickly. What happens, though, when you have 10 overloadings, each with a dozen arguments, each with various combinations of trailing defaulted IN parameters, which can be left off the parameter list? It can be difficult for developers to identify such problems by eyeballing the code. Invalid or potentially ambiguous overloadings can go undetected for a long time (specifically, until that particular combination of parameters is used). It certainly would be nice to uncover such problems before the code author is long gone.

What's My Solution?

The challenge of identifying such overloading problems got me thinking: Perhaps I could build a utility that would "automagically" sweep through a package, examine all possible legal permutations of program invocations, and bring ambiguous overloadings to my attention. Could I do that? Seems like I would need to be able to parse the package specification into its separate programs and also parse each program's parameter lists. I would need to be able to pick out datatypes and default values of those parameters. How could I get at this information? Sadly, I don't have API-level access to a PL/SQL parser, especially not from within PL/SQL itself. And I don't want to even think about writing my own parser. What's an aspiring (obsessed) utility constructor to do? Search out alternatives.

How else might I be able to extract this information from a program? I recall that whenever I compile a PL/SQL program, the Oracle database parses the source code and loads it into the data dictionary. It then provides various data-dictionary views offering different, well, views into that stored code. ALL_SOURCE reveals the source code. ALL_DEPENDENCIES shows the dependencies between objects. ALL_OBJECTS tells me which of my programs is INVALID. Maybe there is a data-dictionary view that could help in this regard. How could I find out? There are so many views, and they can be rather obscure.

To help, I built a utility called dd_view_scan, which looks for views that may meet my needs (see Listing 1; full implementation is in the ddviewscan.sp file). Using dd_view_scan, I can easily search the set of views to uncover any possibly helpful data sources. I can then drill down on a particular view to see if it does indeed contain the information I need. For example, if I want to analyze overloadings of programs, I need to not only know the names of the programs but also examine the parameter lists (also known as arguments). First I ask dd_view_scan about any data-dictionary views that contain the words parameter or argument. You can see the results in Listing 2. (Note: I obtained these results with Oracle9i, Release 2. The same query run with earlier versions may give different results.)

Reviewing the list, I'm drawn to ALL_ARGUMENTS. The others seem to be more specialized than I would like. When I drill down into that argument, I find package name, argument name, and datatype (see Listing 3). This looks promising and warrants further examination. Let's go exploring.

Research: All About ALL_ARGUMENTS

Next stop, Oracle Documentation Set. I eagerly turn to the Oracle9i documentation for at least a first attempt at getting answers. I bring up the documentation in my browser and, using the Master Index, immediately zoom in on ALL_ARGUMENTS. Table 1 shows the description of each of the columns of this view. You can, by the way, obtain similar, although not identical, information with the following query (see all_arguments_cols.sql):

SELECT column_name, comments

  FROM all_col_comments 
 WHERE table_name='ALL_ARGUMENTS'

Sadly, what you see in Table 1 is all the information Oracle provides. It is, regardless, a starting point. The import of many of the columns is self-evident. For others, such as OVERLOAD, POSITION and DATA_LEVEL, it's less clear.

Next, I need to make sure I understand the contents of ALL_ARGUMENTS. I also must prove to myself that ALL_ARGUMENTS contains what Oracle says it does. If you've been around Oracle technology for longer than, say, six months, you know that you can take nothing for granted. Just because the documentation says something, doesn't mean that the technology works that way. And just because an author such as Steven Feuerstein says that it works with Oracle9i, Release 2, running Windows 2000 on his laptop doesn't mean that it will work in the same way on your system. It's important to run your own tests and validate the behavior that you'll be relying on in your application.

Here's what I did: I put together a package named allargs_test (which you'll find in the all_arguments.tst file). It defines a package with subprograms that have various combinations of parameters (or lack thereof), which in turn use a wide assortment of datatypes, parameter modes, and so on.

I then constructed some queries to examine what ALL_ARGUMENTS has to offer about this package. You'll find these queries in the allargs*.sql scripts. To give you a feel for the contents of ALL_ARGUMENTS, I'll review some of the results of those queries. Then I'll offer a list of findings that will guide my design and implementation of OverloadCheck.

Suppose I have a package with the following specification:

CREATE PACKAGE allargs_test
IS 
   PROCEDURE difftype1;

   FUNCTION difftype1
      RETURN VARCHAR2;
END;

Then the contents of ALL_ARGUMENTS for these programs will be as follows:

Core ALL_ARGUMENTS Info for "allargs_test.difftype1"

OVLD  ARGNAME           POS   SEQ   LVL TYPE            DEFVAL IN_OUT
----- --------------- ----- ----- ----- --------------- ------ ---------

1                         1     0     0                          IN
2                         0     1     0 VARCHAR2                 OUT

The results lead me to these conclusions:

  • The OVERLOAD column (abbreviated to OVLD in the previous example) indicates the Nth overloading if programs are overloaded; otherwise, it is NULL.
  • When a procedure has no arguments, ALL_ARGUMENTS contains a row that has a position of 1 and a sequence and level of 0.
  • When a function has no arguments, it does not have a special row. Instead, it simply has a single row for the RETURN clause, in which the position is 0.
  • argument_name for the RETURN clause is NULL.

Now consider Listing 4, which shows some of the rows in ALL_ARGUMENTS for the procedure composites, which is absolutely packed with nonscalar, or composite, datatypes, such as records and collections (see all_arguments.tst for the complete definition of each of these types):


CREATE OR REPLACE PACKAGE allargs_test
IS
   PROCEDURE composites (
     account_in NUMBER,
     person person%ROWTYPE,
     multirec myrec3,
     num_table number_table,
     recs_table myrec_table);
END;

When I analyze this information, I reach these conclusions:

  • The level shows the number of levels of nesting in the parameter list. All arguments that actually appear in the parameter list are at level 0. If one of those parameters is a composite, however, then the individual elements of that composite (such as the fields of a record) appear in ALL_ARGUMENTS at level 1, and so on.
  • The combination of position and level are not unique for the ALL_ARGUMENTS rows of a given program, even within a single overloading. Instead, the position-level combination is unique only for a given "level 0" argument (the actual parameter in the parameter list).
  • The contents of ALL_ARGUMENTS can get awfully complicated, due to the ability to declare composites such as collections and records within other composites, with no limitations on the number of nestings.
  • The datatype field shows only the "generic" type, such as PL/SQL RECORD, but it does not reveal the actual type. For that, it looks like I need to delve into the TYPE_* columns of ALL_ARGUMENTS.

Listing 5 shows the contents of ALL_ARGUMENTS for the following programs:

CREATE OR REPLACE PACKAGE allargs_test
IS 
   PROCEDURE oneargdef (
      onearg IN VARCHAR2 := NULL);
   PROCEDURE oneargdef (
      onearg IN CHAR := 'abc');
END;

In this case, I make an interesting and troubling discovery: Even though ALL_ARGUMENTS claims that it shows you the default value for arguments, in fact, this column is always NULL. You'll discover this sort of problem only if you venture beyond the documentation and into the nitty-gritty reality of Oracle technology.

Finally, let's see what's in all those TYPE_* columns. I use allargs3.sql to show me information only for arguments with non-NULL TYPE_NAMEs. You can see the output for composites in Listing 6. This is useful information. If I rely only on the value in the DATA_TYPE column, I can discern that the datatype is a PL/SQL table, but I can find out which type of table only by looking at the TYPE_SUBNAME column.

To summarize: On the one hand, ALL_ARGUMENTS looks useful. It provides a bonanza of information about arguments. On the other hand, it seems to be missing some crucial information about the presence of default values for arguments. Without this information, I can perform only the most basic kind of analysis of overloadings (Do the programs have the same number and type of parameters?). I can't report on overloaded programs that will be ambiguous when an invocation uses a subset of parameters (and trailing defaulted parameters are left off), but I'd like my utility to handle that level of complexity. My next step is to try to get this information some other way.

What About Supplied Packages?

I've already looked at the set of data-dictionary views. What other prebuilt tools does Oracle offer? Considering that I co-authored a book called Oracle Built-in Packages, it's only natural that I look for a built-in or supplied package (Oracle terminology) that can provide assistance.

I check ALL_OBJECTS for any interesting packages:

SQL> SELECT *
  2    FROM all_objects
  3   WHERE owner = 'SYS' 
  4     AND object_name LIKE '%ARGUMENT%' 
  5     AND object_type = 'PACKAGE'; 

no rows selected

Nothing there. Next, I scan ALL_SOURCE, looking for such words as argument and parameter in the supplied packages' source code itself. Actually, I can view only the package specifications, because the bodies are wrapped (a crude and only partially successful form of encryption). Also, whenever you think about running such queries, you have to watch out for searches that go on for long periods of time, because ALL_SOURCE usually contains an abundance of rows (mine currently has 111,782 rows).

So I put together another little utility—the dd_source_scan procedure (check it out in the ddsourcescan.sp file of the download)—to make sure I do this as efficiently as possible. I would like to find any data-dictionary views that contain the name of the data-dictionary view ALL_ARGUMENTS:

BEGIN
   dd_source_scan (
      '%', 'all_arguments');
END;

Sadly, my scan turns up zero rows of source with that keyword. The next tactic is to relax the search criteria and look for package specifications that contain the word arguments:

BEGIN
   dd_source_scan (
      '%', 'arguments');
END;

Ah! That gives me lots of hits, some of which are shown in Listing 7. DBMS_DESCRIBE may very well be what I'm looking for.

Looking at the documentation, I find: "The procedure DESCRIBE_PROCEDURE accepts the name of a stored procedure, a description of the procedure, and each of its parameters." I must admit that this is a bit bewildering. It accepts all that? But then what does it give me? Sadly, I'm looking at some pretty poor documentation, which means I've spent a fair amount of time investigating the behavior and quirks of DBMS_DESCRIBE.DESCRIBE_PROCEDURE. As you'll discover later in this series, I'll take everything I have learned about this program and cache that knowledge in a separate package.

At this point, I'd like to examine a few of the differences between DBMS_DESCRIBE and ALL_ARGUMENTS and reflect on how they may affect my implementation plans. For example, this procedure returns the "description of the procedure" by passing back information on all of its arguments in a series of PL/SQL collections. The content of these collections roughly mirrors the data retrieved in a query of ALL_ARGUMENTS, but oddly enough (perhaps to be expected), each takes a different route and ends up in a slightly different place.

This is clear when you discover that although ALL_ARGUMENTS provides you with the name of a datatype, such as VARCHAR2 or OBJECT, DBMS_DESCRIBE returns an integer code. In general, DBMS_DESCRIBE relies heavily on integer codes, whereas ALL_ARGUMENTS translates codes into descriptive strings (which makes perfect sense, it being a data-dictionary view that lets you see information about program arguments).

Beyond that, I have learned that each source of structured argument data offers some information that the other does not, and each one is, therefore, lacking. So I came to the conclude that I need to use both. Rather than bore you with all the details, here's a summary of my findings:

Make DBMS_DESCRIBE the default. Because I need some of the information from each source, one should drive the data gathering and be the primary source of argument information. The other should serve as a lookup resource, as needed. I can't request just a single argument via DBMS_DESCRIBE. Instead, I get back all the argument information for a given program name. I can, however, retrieve information about a single argument with a SQL query against ALL_ARGUMENTS. So I'll call DBMS_DESCRIBE to retrieve the bulk of the data and then enrich it with a SELECT from ALL_ARGUMENTS.

Building a Code-Analysis Utility and Doing It Right the First Time: Part 2
E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy