Refactoring for PL/SQL Developers
By Steven Feuerstein
Go beyond identifying best PL/SQL practices to create better code.
What is Refactoring?
Most PL/SQL developers stay very busy building applications. As a result, relatively few explore the wider world of software programming, in which the topic of refactoring is known.
Refactoring, to quote the book Refactoring: Improving the Design of Existing Code by Martin Fowler et al. (Addison-Wesley, 1999), "is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure. . . . It is a disciplined way to clean up code that minimizes the chances of introducing bugs. In essence when you refactor you are improving the design of the code after it has been written."
Who Needs Refactoring?
All professional software developers, in fact, need to refactor their code. That's not bad news; it's good news, as I will demonstrate in this article. Additional good news: Many of us already do refactor our code. We are constantly going back into existing programs—written by ourselves or by someone else—to fix a problem, enhance the program to match new requirements, make it run faster, upgrade it to use new PL/SQL features, and more.
With each of these steps, we often—but not always—improve the internal design of our code. So why introduce a new buzzword to describe what we are already doing? Refactoring goes beyond simply making recommendations for best practices; the discipline of refactoring offers step-by-step guides to making very specific improvements. These guides help us make the changes in a consistent manner, without introducing new bugs.
In this article, I am going to perform a refactoring exercise on a program that checks to see if two files are equal (have the same contents). By seeing the refactoring process in action, you will be able to better evaluate its potential impact on your own development environment.
An Initial Review of eqfiles
The eqfiles function performs a line-by-line comparison of two files to see if their contents are equivalent. The code for this function is shown in Listing 1. It is a fairly straightforward program, and it is written that way, with each step exposed in the main body of the program:
Code Listing 1: The original eqfiles function
1 CREATE OR REPLACE FUNCTION eqfiles ( 2 check_this_in IN VARCHAR2 3 ,check_this_dir_in IN VARCHAR2 4 ,against_this_in IN VARCHAR2 5 ,against_this_dir_in IN VARCHAR2 := NULL 6 ) 7 RETURN BOOLEAN 8 IS 9 checkid UTL_FILE.file_type; 10 checkline VARCHAR2 (32767); 11 check_eof BOOLEAN; 12 againstid UTL_FILE.file_type; 13 againstline VARCHAR2 (32767); 14 against_eof BOOLEAN; 15 retval BOOLEAN; 16 BEGIN 17 -- Open both files, read-only. 18 checkid := 19 UTL_FILE.fopen (check_this_dir_in 20 ,check_this_in 21 ,'R' 22 ,max_linesize => 32767 23 ); 24 25 againstid := 26 UTL_FILE.fopen (NVL (against_this_dir_in, check_this_dir_in) 27 ,against_this_in 28 ,'R' 29 ,max_linesize => 32767 30 ); 31 32 LOOP 33 BEGIN 34 UTL_FILE.get_line (checkid, checkline); 35 EXCEPTION 36 WHEN NO_DATA_FOUND 37 THEN 38 check_eof := TRUE; 39 END; 40 41 BEGIN 42 UTL_FILE.get_line (againstid, againstline); 43 EXCEPTION 44 WHEN NO_DATA_FOUND 45 THEN 46 against_eof := TRUE; 47 END; 48 49 IF (check_eof AND against_eof) 50 THEN 51 retval := TRUE; 52 EXIT; 53 ELSIF (checkline != againstline) OR 54 (check_eof OR against_eof) 55 THEN 56 retval := FALSE; 57 EXIT; 58 END IF; 59 END LOOP; 60 61 UTL_FILE.fclose (checkid); 62 UTL_FILE.fclose (againstid); 63 RETURN retval; 64* END eqfiles;
Lines 17 through 30. Open both files, read-only, maximum line size possible.
The eqfiles function is simple and short, and it gets the job done. So should I bother refactoring this program? To see why the answer is a definitive "yes!" let's review this code for possible problems and opportunities for improvement. I list below the issues that stand out for me; send a note to me at email@example.com if you see anything I missed.
My program has no exception section. Worse, I am using UTL_FILE and there is no exception section. When performing file I/O, the chance of an error is quite high. It is extremely important to handle UTL_FILE errors in the same block in which the code is executed. That way you can get as much information as possible about the error—and, as important, you can close any open files. The way this program is written now, if any error occurs while I'm reading from one of the files, the files will be left open until I terminate my session!
Presence of hard-coded literal values. I want to read from the files, so I specify 'R' as the file open mode. I want to make sure I can read long lines, so I specify the maximum, 32767 , for the max_linesize parameter. Now how did I know those values? I looked them up in the documentation, or I simply remembered them from frequent use. Regardless of the answer, this is not a good way to write code. The values may change over time, and I should not have to trust my memory to get it right.
Repetition of the same logic. The entire FOPEN statement to open a file is repeated, which means that the hard-coded values are also repeated. Repetition of the same or similar logic is one of the biggest problems in software. Multiple instances means extra work to maintain, debug, and upgrade the code. There is also a greatly increased likelihood of bugs working their way into the application.
Exposed complex Boolean logic. The expression that I wrote to evaluate if the files are the same is complicated and hard to follow. If that's the way the author of the code feels about it, it isn't hard to predict how challenging a new reader of this code will find those lines.
Confusing loop structure. Linked to the complexity of the conditional logic is the loop in which the IF statement is found. I am using a simple loop, so the EXIT condition is not explicitly stated; rather, it is buried inside the loop. Furthermore, there are two different EXIT statements, which means that debugging the code will be more challenging (which EXIT path did it follow)?
An Imperfect Program—Like All Others
The eqfiles function isn't perfect, but there's nothing very surprising about that. There isn't any such thing as a perfect program. The question is how to make it better. Given the imperfections described above, I suggest that the following refactorings would apply very well to eqfiles :
Let's take these refactorings one at a time, applying them to eqfiles.
Create local initialization and cleanup procedures. Most executable sections of a PL/SQL block can be logically segmented into three separate phases:
Phase 1. Initialize and check: Initialize local variables, check parameters and other conditions.
Many times when we write code, all of these phases are just jumbled together inside the executable section. This approach makes that section longer and harder to read. You are much better off creating local modules to store the steps for both initialization and cleanup.
Let's look at how to apply this refactoring to eqfiles . Before I can actually run the main body of logic for eqfiles (analyzing if the lines are equal), I need to open the files successfully; this step constitutes the initialization of my program (lines 17 through 30 in Listing 1).
First I create a "stub" at the end of my declaration section:
PROCEDURE initialize IS BEGIN END initialize;
Then I copy the UTL_FILE.FOPEN calls out of the executable section and paste them into the initialize procedure:
PROCEDURE initialize IS BEGIN checkid := UTL_FILE.fopen ... againstid := UTL_FILE.fopen ... END initialize;
Finally, I replace the original UTL_FILE.FOPEN lines of code with a call to initialize, and thus my executable section now looks like this:
BEGIN initialize; LOOP BEGIN
Now for the cleanup (lines 61 through 62 in Listing 1). Again, I create a stub:
PROCEDURE cleanup IS BEGIN END cleanup;
Next I copy the FCLOSE statements out from the end of the program and paste them into this cleanup procedure:
PROCEDURE cleanup IS BEGIN UTL_FILE.fclose (checkid); UTL_FILE.fclose (againstid); END cleanup;
Then I replace the original FCLOSE statements with my call to this cleanup procedure. My executable section is now roughly 10 lines shorter and focused entirely on the core business logic. I have also set myself up to quickly and easily follow the next refactoring.
Include an exception section in every block of code, or justify its absence. There are certainly reasons for not including an exception section in a PL/SQL block, but the point is that you should have a good reason (besides "I forgot" or "I don't have time for that sort of thing").
In the case of a program that manipulates files with UTL_FILE , you should always include an exception section, for two main reasons:
Now that I have created a cleanup procedure for eqfiles , it is quite easy for me to take care of No. 1 above. I simply add an exception section to the function, call the cleanup routine, and then reraise the exception:
Alternatively, I might decide to simply return FALSE instead of propagating the exception, because if I encounter an error, then clearly the two files are not the same.
To take care of No. 2 above, I can add an error handler for a specific exception, and I can easily call the cleanup procedure there as well. If, for example, I provide the name of a file that does not exist, Oracle will raise the UTL_FILE.INVALID_OPERATION error (which in Oracle9i Database now has its own error number: ORA-29283). So I can check for that and take a different action, as in:
Replace all hard-coded literal values with named constants. I am sure this refactoring is an "old story" for most of you. We should never hard-code literal values into our programs. We all know this is a bad idea, because those values change over time but our code likely does not. Yet we all find ourselves doing it anyway. Let's go through the exercise of removing hard-coded values from eqfiles . Here are the values I want to replace with named elements of some sort:
I could simply create named constants inside this function and assign values to those constants. The problem with this approach is that I may need to reference the same values in different parts of my application. Am I going to create named constants all over my application? That is scarcely much of an improvement over the existing code.
Given this situation, a much better approach is to create a single container for these values (and any other hard-coded values related to UTL_FILE ) and then reference the named elements in that container throughout my application.
So to get rid of these hard-coded values, I will
Hide complex expressions and logic behind a function or variable. This program contains a good example of tricky conditional logic (lines 49 through 58 in Listing 1). There isn't a lot of code involved, but it certainly is less than transparent:
IF (check_eof AND against_eof)
Allow me to translate: After I read the next lines from each file, if I have reached the end of both files at the same time, my files are identical and I should stop reading. Otherwise, if the lines are different or if I have reached the end of one file before reaching the end of the other, then my files are not identical and I should stop reading.
Exposing complex logic like this in the main body of a procedure is problematic. It is hard to understand, and it is also a major distraction from seeing the overall flow of the program. A much better approach is to hide all of this logic behind a procedure or a function.
After reading (or attempting to read) the next line from each file, I need to compare the results to see if they are identical and if I should continue to read through the files. So why not create a program just for that? See Listing 3.
Code Listing 3: The compare_latest_read procedure
PROCEDURE compare_latest_read (
Now the logic is more clearly expressed. Sure, there's more code than before, but that's not such an awful thing, if the program is more readable and easier to maintain.
There is another area of exposed complexity in the body of eqfiles. Lines 33 through 47 in Listing 1 contain the logic to read the next line of each file. UTL_FILE.GET_LINE raises the NO_DATA_FOUND exception when it reads past the end of a file. I must therefore wrap each call to GET_LINE inside its own BEGIN-END block, because reaching the end of file in this case is not an error.
Rather than duplicate this error-handling logic twice in this block of code, again creating quite a distraction from the main event, I prefer to isolate the read-and-trap logic in its own procedure, as shown in Listing 4.
Code Listing 4: The get_next_line_from_file procedure
PROCEDURE get_next_line_from_file (
With these changes, the body of the loop now looks like the code in Listing 5.
Code Listing 5: The body of the loop
This is very straightforward code, but it doesn't quite make sense. I am using a simple loop to read each line—but I no longer have an EXIT statement in my loop. This loop will run forever, which leads me to my final refactoring.
Construct each loop so that there is only one way to exit. In the original structure of my program, I used a simple loop that had two different EXIT statements (lines 52 and 57 in Listing 1):
IF (check_eof AND against_eof)
As a consequence, I have multiple EXIT points from the loop. This situation is generally undesirable, as it makes debugging and analysis of program flow more difficult. In this simple piece of code, two different EXIT points don't seem like a big deal (besides the fact that the logic is confusing, which we addressed in the previous refactoring). In a larger, more complex program, however, multiple EXIT points in a simple loop can be a major headache.
To change this code to have a single EXIT point, I switch to a WHILE statement:
and then I simply need to provide the logic to set the "keep checking" flag, which I do with the compare_latest_read procedure. Applying my new loop structure to my modularized loop body, I end up with this fully refactored executable section for eqfiles shown in Listing 6.
Code Listing 6: Refactored executable section for eqfiles
The Full Refactoring
The full refactoring is shown in Listing 7. The main body is short and self-explanatory. If I need the details on how to get the next line or the logic I use to compare the latest lines read from the files, I zoom in on those procedures. I have gotten rid of my hard-coded values and carefully cleaned up any messes that I have made (or, to be accurate, that the user has made by passing in bad values for filenames or directories).
Code Listing 7: The refactored eqfiles function
CREATE OR REPLACE FUNCTION eqfiles (
PROCEDURE get_next_line_from_file (
PROCEDURE compare_latest_read (
The Power of Refactoring
I have found the idea and discipline of refactoring to be very helpful. Refactoring takes us a step beyond identifying best practices by also providing a step-by-step approach to making changes, to minimize the chance of introducing bugs.
Steven Feuerstein (firstname.lastname@example.org) is an authority on the PL/SQL language. He is the author of nine books on PL/SQL (all from O'Reilly & Associates ), including Oracle PL/SQL Best Practices and Oracle PL/SQL Programming. Feuerstein has developed a new active mentoring tool for developers called Qnxo, offers training on PL/SQL, and is a senior technology adviser for Quest Software.