PL/SQL

Getting It Right the Second Time
Installment 8 of Feuerstein's "Building a Code-Analysis Utility" series

by Steven Feuerstein

Invest a little time refactoring your code after it's written — and prevent bugs and maintenance hassles down the road

Even when you pay plenty of attention to a program's design up front, it can still benefit from some systematic review and clean-up after the code is written — a process known as refactoring. If you've been reading this series, you know that I've followed a rigorous, top-down design process in developing OverloadCheck, a utility that analyzes packages for ambiguous overloading (a situation in which two or more programs with the same name have parameters so similar that the PL/SQL compiler can't tell them apart). In this last article in the OverloadCheck series, we'll apply the concept of refactoring to some sections of OverloadCheck that benefit greatly as a result — becoming simpler, easier to follow, and easier to maintain. Then, we'll discuss the key lessons I learned in the process of developing OverloadCheck.

What is "Refactoring"?

In 1999, Martin Fowler published a book called Refactoring: Improving the Design of Existing Code (Addison-Wesley). In this book, he defined refactoring as:

The process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure... It is a disciplined way to clean up code that minimizes the chances of introducing bugs. In essence when you refactor you are improving the design of the code after it has been written.
Refactoring is a very insightful and useful book. It takes a disciplined, "one step at a time" approach to identifying commonly occurring weaknesses in code design. It then offers methodical, consistent approaches to strengthening your design, all without affecting the users of your code.

Fowler's book explores the technique of refactoring using Java as the example programming language. (The concept of refactoring originated within the Smalltalk programming community, but Java is similar to Smalltalk in being an object-oriented language.) This fact alone should be enough to convince you to spend a day or two learning the basics of Java programming. I recommend the book Thinking in Java, by Bruce Eckels (Prentice-Hall, 2002), as the best way to quickly get a grasp of the language. It doesn't take long to learn how to read Java code — and then you can learn from Refactoring all sorts of excellent techniques for improving your PL/SQL code!

We don't have room in this article to discuss the specifics of the many code-improving techniques Fowler suggests in his book, but we can use the basic concepts of refactoring to lead us to some concrete examples of how to improve already-written code. Our goal will be to look for places in OverloadCheck where we can do the following:

  • Simplify where things are done (reorganizing the code's structure and moving code from one place to another where it fits better)
  • Simplify how things are done (making code shorter, more elegant, and easier to follow)

In the process, we'll see that these goals are often intertwined. When you reorganize your code's structure, you often end up improving how it implements its logic — and vice versa.

Everything in its Place

When you go back and review the structure of your code after you've finished writing it, look for sections that may not be located in the most appropriate place. For example, you might find similar logic appearing more than once, indicating that you may want to consolidate the logic into one location. Or, you might notice a large concentration of calls to programs in another package — in which case the question naturally arises: might all of this code belong in that package? In the OverloadCheck package, one such section is pretty obvious when you look at the private function named same_program_types (see Listing 1).

The same_program_types function accepts a program name and pointers to two different overloadings for that program name. It returns TRUE if both of these overloadings have the same type (where the type can be either PROCEDURE or FUNCTION); otherwise, it returns FALSE. Here is an example of how the function is called in the OverloadCheck package:

IF same_program_types (
     program_in
    ,check_this_ovld_in
    ,against_this_ovld)
      THEN
         compare_all_invocations (...);
      END IF;

Now, let's look at what happens in the function itself (Listing 1). In the declaration section (lines 7 through 12), I call the cc_smartargs.is_function program to find out if each of the overloadings is a function. Then in the body of the function (a single RETURN statement shown in lines 14 through 18), I check to see whether these overloadings are both functions or both procedures. Those are — at least, as of this writing— the only options available to us!

As you can see, the main work of this function is performed by a function in another package: cc_smartargs.is_function. If that is the case, why wouldn't I put my same_program_types function in that package as well? The more general way of framing this question is: does the functionality described by same_program_types belong in the high-level OverloadCheck package, or does it belong in another, more narrowly-defined package, such as cc_smartargs?

The point of cc_smartargs is to hide as much detail as possible about the contents of ALL_ARGUMENTS — and to make the data "smarter" than that stored in ALL_ARGUMENTS. To accomplish these goals, I defined a four-level nested collection, of which the top level employs a string-based index based on the program name. The second level of nesting is indexed by the overloading for each program. As you can see by glancing at the header of same_program_types (Listing 1, lines 1 through 5), the parameter list of this function reflects those two levels of nesting. If the cc_smartargs package is meant to hide this kind of information, does it really make sense to expose details of cc_smartargs by putting the same_program_types function in the high-level OverloadCheck package?

I don't think so. Looking at the entirety of OverloadCheck, I can see now that same_program_types should be moved to the cc_smartargs function. The steps involved in my refactoring in this case are clear:

  1. Take the implementation of same_program_types function from OverloadCheck and put it in the cc_smartargs package body (preferably right below the is_function function, since the two functions are so closely related).
  2. Add the entry to the cc_smartargs package specification as well.
  3. Change any calls to same_program_types in OverloadCheck to use the cc_smartargs version.

These steps mostly involve cutting and pasting (carefully!), so I won't bother you with the details. The only change in OverloadCheck (besides reducing the overall length of the package) is to preface the call to same_program_types with the name of the new owner package, as follows:

IF cc_smartargs.same_program_types (
     program_in
    ,check_this_ovld_in
    ,against_this_ovld)
      THEN
         compare_all_invocations (...);
      END IF;

Am I done now? Not quite. Now that I am paying closer attention to this function, I am struck by the volume and complexity of the code I have written to perform what would seem to be a rather basic operation. Which brings me to my other goal in refactoring the program...

Keeping Things Simple

In general, the simpler your code is, the easier it is to debug, maintain, and enhance. So, if you ever get the feeling that a section of your program is too complex, follow your intuition and search for ways to simplify it. When you're refactoring, look for sections that seem longer and more complex than you'd expect them to be, given what they need to accomplish.

In the case of the same_program_types function, a second look led me to an idea about how to express the same logic more simply. This new implementation uses the Boolean expression in a more direct and intuitive fashion, as follows:

FUNCTION same_program_types (
   program_in     IN   VARCHAR2
  ,overload1_in   IN   PLS_INTEGER
  ,overload2_in   IN   PLS_INTEGER)
   RETURN BOOLEAN
IS
BEGIN
   RETURN (  
     is_function (program_in, overload1_in) = 
     is_function (program_in, overload2_in));
END same_program_types;

Notice that this new version of same_program_types has about half as many lines as the old version (shown in Listing 1), in addition to being more elegant in its logic.

And so, my refactoring of same_program_types (and the package in which it is most appropriately defined) is complete. In the process of refactoring, I removed the external references (calls to other packages) inside the same_program_types function, making the function self-contained. I also greatly simplified my implementation, making the code easier to maintain in the future.

Now, let's look at another refactoring of a OverloadCheck program — one that illustrates a useful strategy for both reorganizing and streamlining code.

Hiding the Details

In refactoring the same_program_types function, we started by questioning where the code was located, then ended up simplifying how the logic worked as well. Another helpful approach in refactoring is to look for overly-long sections of code that you can simplify by hiding details (moving them to a lower-level program or package) — again changing both how and where code is located to achieve a more streamlined, easier-to-follow program.

In building OverloadCheck, I tried to discipline myself to deal with only a certain amount of complexity at a time. I relied on top-down design (also known as stepwise refinement) to hide details until I absolutely needed to address them, trying my best to keep all executable sections very small and readable. There are, however, some lapses — which pop out at me quickly in the refactoring process when I scan my code for overly-long executable sections. (Anything longer than 20 or 30 lines is pretty easy to spot when you keep most executables short!) Let's take a look at one such lapse and refactor it to improve both its readability and maintainability.

The check_for_similarity program in OverloadCheck is a central element of the package — but does it really need to be 37 lines long (see Listing 2)? It contains the logic to iterate through each parameter of two overloadings and compare the datatype families for each parameter. If it determines that the parameter lists are "too similar" (for instance, if neither parameter list has any parameters, or if they have the same number and same types of parameters), then it reports the overloading with a call to cc_report.ambig_ovld.

As I look at these 37 lines of code in the dispassionate light of the refactoring process, I am struck immediately by two aspects of the code:

  • The conditional logic that determines if the parameters are too similar seems overly complex to me. I go through the various cases, setting a too_similar flag. Then, based on the value of that flag, I make my report. Do I really need these two phases to get the job done? It seems to me this could be more straightforward.
  • The call to cc_report.ambig_ovld is alarmingly long and difficult to follow! Specifically, what is the following line supposed to mean to anyone (even the author, after not working with the code for a month or two)?

cc_util.ifelse (l_numargs1 = 0, 0, 1) 

Now, I could explain to you what I am doing in this line; I could even use named notation in my call to ambig_ovld to make it more clear. The bottom line, however, is that this code is far from self-explanatory. Furthermore, I'm not happy about the need for additional comment — or about the presence of such complicated logic in the parameter list of the program. These red flags show me that the way I've designed the reporting mechanism doesn't correspond closely enough with how it needs to be used.

Time to refactor!

First of all, of course, I need to understand what I am trying to accomplish with the cc_util.ifelse logic. It comes down to this: in the special case when one of the programs I am analyzing has no parameters whatsoever, then I need to pass 0 for the starting parameter location; otherwise, the starting location is 1. Well, that seems clear enough, doesn't it? Not really. Upon closer examination of both the usage and implementation of ambig_ovld, I soon realize that I am trying to get this single program to do "double duty" — reporting both a general case of ambiguous overloading and a special case, in which neither of the overloadings have any arguments.

In this situation, a simple refactoring is called for: create a second reporting program to handle the special case. So, I create a new program in cc_report that handles no arguments and is nothing more than a direct encapsulation of the original ambig_ovld:

PROCEDURE cc_report.ambig_ovld_noargs (
   owner_in          IN   VARCHAR2,
   package_name_in   IN   VARCHAR2,
   object_name_in    IN   VARCHAR2,
   overload1_in      IN   PLS_INTEGER,
   overload2_in      IN   PLS_INTEGER,
)
IS
BEGIN
   cc_report.ambig_ovld (
      owner_in          => owner_in,
      package_name_in   => package_name_in,
      object_name_in    => object_name_in,
      overload1_in      => overload1_in,
      startarg1_in      => 0,
      endarg1_in        => 0,
      overload2_in      => overload2_in,
      startarg2_in      => 0,
      endarg2_in        => 0
   );   
END; 

All I am doing here is hiding the need to pass values of zero for start- and end-argument positions. As a result, I can offer a much-simplified parameter list in the reporting program.

With my new, special-case reporting program in place, I can go back to my original check_for_similarity program and restructure my executable section to make it simpler and easier to understand. In addition to hiding the details of reporting a special case, I decide to hide some complex parameter-comparison logic by moving it to a new local module, compare_each_parameter (Listing 3). As a result of these changes, the body of the check_for_similarity program shrinks from 37 lines (Listing 2) to the easier-to-understand 17 lines shown here:

 1  BEGIN
 2     IF l_numargs1 != l_numargs2
 3     THEN
 4        NULL;
 5     ELSIF l_numargs1 = 0 AND l_numargs2 = 0
 6     THEN
 7        cc_report.ambig_ovld_noargs (
 8           cc_smartargs.owner_name,
 9           cc_smartargs.package_name,
10           program_in,
11           overload1_in,
12           overload2_in
13          );
14     ELSE
15        compare_each_parameter;
16     END IF;
17* END;

Notice how this block of code is much more clearly expressive of the intention in the underlying logic, compared to the old version in Listing 2. I have gotten rid of the unwieldy too_similar flag. Now, my conditional logic leads directly to a resolution: either there's no need to report (because the two overloadings have a different number of arguments, hence are clearly different) or I call the appropriate reporting logic. In the special case of "no arguments," I immediately call my new reporting program. Otherwise, I move all my more-complicated logic comparing each parameter into its own local module with a clearly descriptive name: compare_each_parameter.

Looking more closely at the compare_each_parameter procedure (Listing 3), we see that it contains all the logic formerly exposed in the check_for_similarity executable section. Note that, in the process of moving the code, I have also moved from the outer program all declarations of variables that are used only in this new local program. This small step is important for avoiding potential confusion and bugs later on.

Notice also that the call to cc_report.ambig_ovld is now simpler. No more confusing calls in the middle of the parameter list to cc_util.ifelse (an in-line conditional statement discussed in Installment 7, "Crafting Service Providers: Creating a Versatile Reporting Package". The ambig_ovld procedure now directly handles the case for which it was designed (reporting on overloadings with non-trivial parameter lists), and the programmer does not need to be aware of special case-handling logic.

The final version of OverloadCheck's call to report an ambiguous overloading looks like this:

 1  cc_report.ambig_ovld (
 2     cc_smartargs.owner_name
 3     ,cc_smartargs.package_name
 4     ,program_in
 5     ,overload1_in
 6     ,1
 7     ,lastarg1_in
 8     ,overload2_in
 9     ,1
10     ,lastarg2_in
11    );

Lessons Learned

For all the care I took in building OverloadCheck, the code still provides plenty of opportunities for improvement through refactoring techniques such as streamlining the structure, simplifying how the logic is implemented, and hiding details to make executable sections more readable. I could run through dozens of refactorings and there would likely be dozens more to perform.

Should I be depressed, by this seemingly endless cycle of analysis and improvement? Not at all! If I have learned anything over all my years of close involvement with PL/SQL, it is this: there are always ways to improve upon what I did before. This is not necessarily any comment upon or criticism of my earlier work, which was (we should assume) the best I could do at that time. The important point is that I now know more than I did before — and I can act upon this new knowledge to improve my craft of software programming.

Another key lesson I learned in the process of writing OverloadCheck is the crucial role that discipline plays in building software. While I believe there is, and must be, a high degree of creativity in our work, we also need to give as much structure and consistency as possible to that creativity. Naming conventions are one way of providing that consistency, but guidelines that inform deeper structural aspects of the application are even more important.

In OverloadCheck, the main focus for my structural discipline was my fanaticism about keeping executable sections small. In the thousands of lines of code that make up OverloadCheck, you will rarely find an executable section that consists of more than 20 or 30 lines of code. The result of my efforts — during both the writing and the refactoring stages — is a body of code that is eminently readable, self-documenting, and almost transparently easy to fix and enhance.

One final lesson is that I should always be on the lookout for new features in Oracle that can improve the way my code base is written. In Oracle9i Database, for example, a new data dictionary view called ALL_PROCEDURES contains helpful information about programs, both stand-alone and within packages. Prior to Oracle9i, just a portion of this program-level information was available — and only indirectly so — from ALL_ARGUMENTS (which I used as my information source in OverloadCheck, since I didn't know about ALL_PROCEDURES). Now, this information is available more directly and much more efficiently from ALL_PROCEDURES. I invite my readers to explore how to integrate ALL_PROCEDURES into OverloadCheck, as doing so will provide an excellent exercise in refactoring!

These are the lessons I've learned in this series, as I've taken you step-by-step through the construction of a complex new utility. I hope the journey has been a similarly fruitful one for you — and that your future code benefits as much as I know mine will.

Steven Feuerstein ( steven@stevenfeuerstein.com) is one of the world's leading experts on the Oracle PL/SQL language. He is the author or co-author of nine books on PL/SQL, including Oracle PL/SQL Programming, 3rd Edition and Oracle PL/SQL Best Practices (O'Reilly; http://oracle.oreilly.com/). Steven is now developing a new tool named Swyg (www.Swyg.com) that will help developers write well-structured code such as that seen in OverloadCheck. He is a senior technology advisor for Quest Software, has been developing software since 1980, and worked for Oracle from 1987 to 1992. He is also Coordinator of the Refuser Solidarity Network, an organization that builds support for the Israeli refuser movement (www.refusersolidarity.net).

Roadmap for Building a Code-Analysis Utility

Steven Feuerstein, Oracle PS/SQL language expert, shows each step in the process of creating OverloadCheck—a utility that checks the quality of PL/SQL code. In a series of eight articles, he takes you from the beginning of the process (defining the problem he wants his utility to solve), to researching, designing, and refining this new creation until it truly works. As you watch the quality-assurance utility come to life, you'll also get exposure to some of PL/SQL's newest features and see how using best practices can benefit any coding project.

Article 1: Building a Code-Analysis Utility and Doing It Right the First Time

The first step shows what's involved in defining the goal of this utility, how to perform required analysis, and how to translate the results of that analysis into a useful form for developers.

Article 2. Getting Started, Starting with Testing

Although it would be tempting to start writing code at this stage in the process, the author shows the importance of waiting while he decides how to test his code.

Article 3. Creating a High-Level Design

In this article, see how to conceive a basic but workable architecture, while avoiding over designing.

Article 4. Implementing OverloadCheck: The Construction Phase

While showing how to get started writing code, Feuerstein demonstrates how to ensure code readability and how to minimize the number of bugs.

Article 5. Adding Smarts to the Argument Information

This article tackles some of the most complex logic involved in analyzing potential overloading ambiguities. It also shows how important it is to hide the complexity of such structures as multilevel and string-indexed collections behind procedures and functions.

Article 6. Crafting 'Service Providers': Packages with Focused Functionality

Learn by example how to identify the distinct services that relatively small units offer to higher-level packages, and how to consolidate everything related to those services.

Article 7. Crafting Service Providers: Creating a Versatile Reporting Package

Find out how to streamline code with the dynamic WHERE clause and how to put a user-friendly wrapper around unwieldy-but-useful procedures (such as DBMS_UTILITY.NAME_RESOLVE and DBMS_OUTPUT.PUT_LINE) as you explore the creation of OverloadCheck's reporting package.

Article 8. Getting it Right the Second Time

In the final stage, Feuerstein uses a test engine and his predefined test cases to do some testing. He also shows how to use the process of refactoring to fix problems and improve the inner workings of the program.

E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy