|
PL/SQL
Crafting Service Providers: Packages with Focused Functionality
Installment 6 of Feuerstein's "Building a Code-Analysis Utility" series
by Steven Feuerstein OTN Member since 2001
Taking advantage of lower-level packages
The first five articles of this series covered substantial ground, from the basics of defining the problem I wanted my code-analysis utility to solve to immersion in some of the most complex logic the program requires. Now it's time to have a look at the lower end of the hierarchy of OverloadCheck software, where there are several packages that are relatively small, focused units. These offer services to higher-level packages. One important element of software design is to identify these distinct services and craft individual packages (or perhaps object types) that consolidate information related to those services. In the next two articles, I will explore OverloadCheck's various "service providers."
The cc_arguments package is a much smaller, simpler unit than cc_smartargs. It has a clear mission in life: to extract information from ALL_ARGUMENTS and DBMS_DESCRIBE so that the maximum, and most accurate, data about arguments is available through a clean, simple application-programming interface (API). It also provides functions for returning information about arguments.
I'll mine this package for new and interesting elements, rather than repeat lessons and techniques you've read about in earlier articles in this series. I'll focus on the following:
- Completing the definition of the data structures that cc_smartargs uses
- Using named constants to hide "magic values"
- Merging the data from these two sources
Data Structures of cc_arguments
The cc_smartargs package makes reference to two structures:
- cc_arguments.one_argument_rt: This is supposed to contain all the information for a single row from ALL_ARGUMENTS and DBMS_DESCRIBE.
- cc_arguments.arguments_tt: This is supposed to contain a list of records, each of type one_argument_rt.
I say "supposed to" because I identified the need for these structures before I actually designed them. The time to design them has finally come. Now if I were only obtaining information from ALL_ARGUMENTS, I wouldn't need one_argument_rtI could simply use ALL_ARGUMENTS%ROWTYPE to specify my record structure. Given the need to also extract information from DBMS_DESCRIBE, however, I'll have to shift gears and create my own record TYPE, as shown in Listing 1.
Whenever possible, I use the %TYPEs attribute against ALL_ARGUMENTS in order to declare the datatype, so that I avoid hard-coding (particularly of VARCHAR2 lengths). Some of these values, however, will actually come from DBMS_DESCRIBE, because (as I described in the first article of this series, in the section "What About Supplied Packages") it will be the driver.
Once I have defined my record TYPE, I can easily define a collection to hold those records:
TYPE arguments_tt IS TABLE OF one_argument_rt
INDEX BY BINARY_INTEGER;
And a collection of this type is exactly what the main program of cc_arguments returns:
FUNCTION fullset (program_in IN VARCHAR2)
RETURN arguments_tt;
This must seem incredibly straightforward to you after delving into the mysteries and wonder of cc_smartargs collection TYPEs!
Using Named Constants to Hide Magic Values
A magic value is one of those special literals that seem to pop up over and over again in an application, usually with the tagline "it will never change" or "it will always have this value." Experienced developers, of course, immediately smell a rat whenever they see or hear anything like that. It's precisely those pieces of data that seem to have some permanence that are most likely to change at the most challenging moment.
In general, you should never trust magic values. Assume that they'll change, and avoid hard-coded references to them. The solution? Simply create a named constant in your package specification. By giving the value a name, you make your code more readable and you hide the current value.
In the context of argument information, several magic values come to mind:
- The level value that indicates a top-level parameter (0)
- The position of the RETURN clause argument row (0)
- The parameter modes (IN, OUT, IN OUT)
- The indicator for the presence of a default value (1 or 0)
Here are the constants I've set up for these magic values:
c_top_level CONSTANT PLS_INTEGER := 0;
c_return_pos CONSTANT PLS_INTEGER := 0;
--
-- Parameter mode values used in DBMS_DESCRIBE
c_in CONSTANT PLS_INTEGER := 0;
c_out CONSTANT PLS_INTEGER := 1;
c_inout CONSTANT PLS_INTEGER := 2;
--
-- Indicators for presence of default value
c_has_default CONSTANT PLS_INTEGER := 1;
c_has_no_default CONSTANT PLS_INTEGER := 0;
Let's take a look at a couple of cc_arguments programs that reference these constants, so that you can see how using such elements makes the code easier to follow. The is_return_clause function accepts a single row of information and returns TRUE if the position of the argument matches the return position:
FUNCTION cc_arguments.is_return_clause (
arg_in IN one_argument_rt)
RETURN BOOLEAN
IS
BEGIN
RETURN arg_in.POSITION = c_return_pos;
END;
And if I need to find out if an argument row corresponds to a top-level parameter, I don't need to remember that the level must be 0. I just call the appropriate function:
FUNCTION cc_arguments.is_toplevel_parameter (
arg_in IN one_argument_rt)
RETURN BOOLEAN
IS
BEGIN
RETURN arg_in.data_level = c_top_level;
END;
You may look at these tiny programs and be tempted to say, "Why bother creating a function? It's just a single-line conditional expression. This is a waste of my time!" Recognize this voice as that of the Programming Goblin, a mischievous creature whose point in life is to convince us to take shortcuts, ostensibly in order to get our jobs done more quickly or effectively. Nothing could be farther from the truth. Shortcuts in software inevitably lead to more bugs and brittle software that breaks easily and is hard to put back together.
Let's look at one final application of these named constants: the parameter modes. DBMS_DESCRIBE returns a parameter mode as one of three values: 0 for 1, 1 for OUT, and 2 for IN OUT. I sure don't want to have to remember that, so I created the named constants. Fine, but that's not enough in this case. I also need some way to translate those numeric values into the mode names (suppose, for example, that I want to generate valid headers for programs from the argument information).
Within cc_smartargs, I do this by creating a collection to hold the names and then offer a function to perform the translation. Here's my collection type and instantiated list:
CREATE OR REPLACE PACKAGE BODY cc_arguments
IS
TYPE mode_names_t IS TABLE OF VARCHAR2(6)
INDEX BY BINARY_INTEGER;
g_mode_names mode_names_t;
After I've created this collection, I populate it in the initialization section of the package:
BEGIN
g_mode_names (c_in) := 'IN';
g_mode_names (c_out) := 'OUT';
g_mode_names (c_inout) := 'IN OUT';
END cc_arguments;
Finally, here's my very simple translation function:
FUNCTION mode_name (code_in IN PLS_INTEGER)
RETURN VARCHAR2
IS
BEGIN
RETURN g_mode_names (code_in);
EXCEPTION
WHEN NO_DATA_FOUND
THEN
RETURN NULL;
END;
Merging ALL_ARGUMENTS and DBMS_DESCRIBE Data
The most important program in cc_arguments is fullset, which returns the collection of merged-argument information. Let's take a look at the highlights of this program's implementation. Listing 2 shows the steps of the main executable section.
In line 2, I rely on cc_names to get the name elements (owner, package name, and program name) for the specified code object. In lines 4 through 7, I open up a cursor that will identify all the distinct program names for that object. I need to do this because DBMS_DESCRIBE works with only one specific program at a time; you can't pass it an entire package and retrieve descriptions of all the different programs in that package.
For each program identified by the query, I call DBMS_DESCRIBE (I've hidden details behind get_dbms_describe_info), which deposits its results into a series of collections (see cc_arguments.pkb within the overload_check.zip file for the details). Then, for each row in those arrays, using the l_argument_name array as the driver, I transfer the data into the arguments_tt collection that will be returned by the fullset function.
The transfer_data program consists of two steps:
BEGIN
copy_to_arguments_array (rec, indx, l_argindx);
add_all_arguments_info (retval (l_argindx));
END transfer_data;
The first program simply moves the data over. The second program recognizes that DBMS_DESCRIBE doesn't give me everything I need. It looks up the corresponding row in ALL_ARGUMENTS and fills out the missing fields in the collection's row (see Listing 3), where onerow is another function in cc_arguments that retrieves a single row (check out the details in the source code).
I have now moved through the implementation (or at least the highlights) of OverloadCheck, cc_smartargs, and cc_arguments. So, what's left?
- cc_report
- cc_names
- cc_types
(Oh yes, and then there's the actual testing of all this code to see if it works! But that's an item of discussion to save for the final article of this seriesInstallment 8 which will come in all good time.)
Encapsulating Datatype Information
The cc_types package is one of my atomic, low-level units. Its mission is to encapsulate knowledge (some of it quite arcane) about the various datatypes supported by Oracle in general and PL/SQL more specifically.
Table 1 shows the programs I've already identified a need for in cc_smartargs and cc_arguments.
At the top of the package specification, you'll see a long list of named constants that serve a similar purpose to those in cc_smartargs. In this case, I'm giving names to datatype values, as in:
c_date CONSTANT PLS_INTEGER := 12;
OverloadCheck uses these constants to populate a collection, which the cc_types.name function uses to translate from code to name. In fact, almost all of the cc_types functions take advantage of collections to store and then easily retrieve various kinds of type-related information.
Let's look at a simple example of this collection application, the cc_types.is_composite_type function, and then we'll explore the most complicated aspect of cc_types finding out if two datatypes are in the same family.
Using a Collection to Store Knowledge
The implementation of cc_types.is_composite_type is simple:
FUNCTION is_composite_type (type_in IN PLS_INTEGER)
RETURN BOOLEAN
IS
BEGIN
RETURN c_composite_types.EXISTS (type_in);
END;
In other words, you pass it the datatype code and if a row for that numeric code exists in the c_composite_types collection, then it's a composite datatype. I declare a collection type and collection at the top of my package body:
TYPE booleans_tt IS TABLE OF BOOLEAN INDEX BY BINARY_INTEGER;
c_composite_types booleans_tt;
I use a table of Booleans because I don't need to store anything more than an indicator in this collection's rows; I use the code as the row number and the rest flows from that. I declare it with a c_ prefix because I want it to be treated as a constant, even if I cannot declare it as one (to do so, I would need to initialize it with a function defined in another package; it's just not worth the added complexity).
I create a program for loading up the collection:
PROCEDURE load_composite_types
IS
BEGIN
c_composite_types (c_record) := TRUE ;
c_composite_types (c_indexby_table) := TRUE ;
c_composite_types (c_object_type) := TRUE ;
c_composite_types (c_nested_table) := TRUE ;
c_composite_types (c_varray) := TRUE ;
END;
And I call that procedure in the initialization section of the package, along with several other collection-populating programs (we'll look at load_in_same_family in the next section):
BEGIN
load_in_same_family;
load_trouble_types;
load_type_translators;
load_composite_types;
END cc_types;
I don't need to remember which of the datatypes correspond to composites. And if Oracle introduces a new datatype that is a composite, I'll simply add a single assignment to the load_composite_types procedure, and is_composite_type will recognize it.
Same-Family Analysis in cc_types
To identify potentially ambiguous overloadings, I need to compare the datatypes of corresponding parameters. If the parameters are of the same datatype, there's a potential ambiguity. But even if they are not the same exact datatype, I may have a problemthe datatypes need only be in the same "family", as is the case with NUMBER and INTEGER.
The question of whether two datatypes are in the same family naturally falls within the scope of cc_types, and cc_types.in_same_family is the function that answers the question. But now I face the question, How should I implement this function?
Maybe I could do something just like the cc_types.is_composite_type function, namely:
FUNCTION in_same_family (
type1_in IN PLS_INTEGER, type2_in IN PLS_INTEGER)
RETURN BOOLEAN
IS
BEGIN
RETURN same_family.EXISTS (type1_in, type2_in);
END;
Well, this looks appealing on the surface, but it won't workat least not like this. I must compare two different types, but Oracle does not yet support true multidimensional arrays, which is implied by the syntax same_family.EXISTS (type1_in, type2_in) or same_family (type1_in, type2_in).
What's a developer to do? Emulate! Even if I cannot create a real two-dimensional array, I can achieve much the same effect with a nested collection: one inner collection in each row of the outer collection. Here are the necessary declarations:
TYPE booleans_tt IS TABLE OF BOOLEAN
INDEX BY BINARY_INTEGER;
TYPE type_families_tt IS TABLE OF booleans_tt
INDEX BY BINARY_INTEGER;
type_families type_families_tt;
With this collection, I can rewrite my first attempt at in_same_family as follows:
FUNCTION in_same_family (
type1_in IN PLS_INTEGER, type2_in IN PLS_INTEGER)
RETURN BOOLEAN
IS
BEGIN
RETURN type_families (type1_in).EXISTS (type2_in);
END;
At first glance this looks OK, but what if the row for type1_in is not defined? The function will raise the NO_DATA_FOUND exception (something I can avoid by using the EXISTS method). So, I really should do this:
FUNCTION in_same_family (
type1_in IN PLS_INTEGER, type2_in IN PLS_INTEGER)
RETURN BOOLEAN
IS
BEGIN
RETURN type_families (type1_in).EXISTS (type2_in);
EXCEPTION
WHEN NO_DATA_FOUND THEN RETURN FALSE;
END;
This code requires, however, that the original developer (and, more important, the person performing code maintenance) understand the implications of the exception section. I should probably add a comment to explain. But whenever I find myself saying, "I should explain that with a comment," I stop myself and ask this question: Is there a way to rewrite that logic so that the comment is unnecessaryso that the code is self-documenting?
In this case, I think there is. Consider the following implementation:
FUNCTION in_same_family (
type1_in IN PLS_INTEGER, type2_in IN PLS_INTEGER)
RETURN BOOLEAN
IS
BEGIN
IF NOT type_families.EXISTS (type1_in)
THEN
RETURN FALSE;
ELSE
RETURN type_families (type1_in).EXISTS (type2_in);
END IF;
END;
I no longer have to worry about NO_DATA_FOUND, because I use EXISTS to check for the presence of both rows. And now I'm saying explicitly with the code itself, "If the row for the first type doesn't exist, return FALSE. Otherwise, return TRUE if the row for the second type also exists." In other words, I have flagged that combination as too similar. I hope you agree that this is a superior implementation.
I still, however, have to figure out how to populate this two-dimensional (sort of) array. Let's take a look at a concrete example: the string family. All of the following are string datatypes that will be indistinguishable to the compiler: VARCHAR2, VARCHAR, CHAR, LONG, NVARCHAR2, and NCHAR.
I need to fill up the array so that
- Each datatype is registered as being in the same family as itself.
- Each datatype is registered as being in the same family as all the others.
Let's take it a step at a time. For any two datatype codes, I can cover all the permutations for the above two conditions with the program in Listing 4. Lines 7 and 8 make sure that a datatype is marked as being in the same family as itself. Lines 9 and 10 build the list of families for different datatypes. Simple enough, but now how do I put this procedure to use? How can I tell it which combinations I want loaded in a way that's concise and readable?
Suppose I build a list of all those datatypes that are in the same family (I'll call it one_family_tt). I can then run through all the permutations as follows:
PROCEDURE load_one_family (family_in IN one_family_tt)
IS
BEGIN
FOR o_index IN family_in.FIRST .. family_in.LAST
LOOP
FOR i_index IN family_in.FIRST .. family_in.LAST
LOOP
load_permutations (o_index, i_index);
END LOOP;
END LOOP;
END load_one_family;
And now I'm ready to complete my implementation of the program that will load up the two-level collection used by in_same_family (see Listing 5). Table 2 provides an explanation of the code, which should make it easy for you to follow the logic.
I would say that this code is self-explanatory. Is it easy to maintain? Well, if I need to add another datatype to the string family, I simply add an element to the constructor list. If I need to add an entirely new family, then I insert another invocation of load_one_family, with the appropriate list of datatypes.
I have now completed an explained my implementation of the logic required to find out whether two datatypes are in the same family, and I've shown the highlights of the cc_types package. In the next article, I'll explain how to take advantage of the very useful DBMS_UTILITY.NAME_RESOLVE procedure and explore the separate reporting mechanism for OverloadCheck. I'll also show you a generic error-handling package that includes an assertion mechanism and a smart RAISE procedure.
Steven Feuerstein ( steven@stevenfeuerstein.com) is one of the world's leading experts on the Oracle PL/SQL language. He is the author or co-author of nine books on PL/SQL, including Oracle PL/SQL Programming, 3rd Edition and Oracle PL/SQL Best Practices (O'Reilly; http://oracle.oreilly.com/). Steven is a senior technology advisor for Quest Software, has been developing software since 1980, and worked for Oracle from 1987 to 1992. He is also past president of the board of directors of the Crossroads Fund, which makes grants to Chicago-area community groups working for social, racial, and economic justice ( www.CrossroadsFund.org).
|
Roadmap for Building a Code-Analysis Utility
Steven Feuerstein, Oracle PS/SQL language expert, shows each step in the process of creating OverloadChecka utility that checks the quality of PL/SQL code. In a series of eight articles, he takes you from the beginning of the process (defining the problem he wants his utility to solve), to researching, designing, and refining this new creation until it truly works. As you watch the quality-assurance utility come to life, you'll also get exposure to some of PL/SQL's newest features and see how using best practices can benefit any coding project.
Article 1: Building a Code-Analysis Utility and Doing It Right the First
Time
The first step shows what's involved in defining the goal of this utility, how to perform required analysis, and how to translate the results of that analysis into a useful form for developers.
Article 2. Getting Started, Starting with Testing
Although it would be tempting to start writing code at this stage in the process, the author shows the importance of waiting while he decides how to test his code.
Article 3. Creating a High-Level Design
In this article, see how to conceive a basic but workable architecture, while avoiding over designing.
Article 4. Implementing OverloadCheck: The Construction Phase
While showing how to get started writing code, Feuerstein demonstrates how to ensure code readability and how to minimize the number of bugs.
Article 5. Adding Smarts to the Argument Information
This article tackles some of the most complex logic involved in analyzing potential overloading ambiguities. It also shows how important it is to hide the complexity of such structures as multilevel and string-indexed collections behind procedures and functions.
Article 6. Crafting 'Service Providers': Packages with Focused Functionality
Learn by example how to identify the distinct services that relatively small units offer to higher-level packages, and how to consolidate everything related to those services.
Article 7. Crafting Service Providers: Creating a Versatile Reporting Package
Find out how to streamline code with the dynamic WHERE clause and
how to put a user-friendly wrapper around unwieldy-but-useful
procedures (such as DBMS_UTILITY.NAME_RESOLVE and
DBMS_OUTPUT.PUT_LINE) as you explore the creation of OverloadCheck's
reporting package.
Article 8. Getting it Right the Second Time
In the final stage, Feuerstein uses a test engine and his predefined test cases to do some testing. He also shows how to use the process of refactoring to fix problems and improve the inner workings of the program.
|
Table 1: Programs needed by cc_smartargs and cc_arguments
|
Program | Purpose |
FUNCTION NAME (code_in IN PLS_INTEGER)
RETURN VARCHAR2 | Translates a datatype code into a name |
FUNCTION is_composite_type (
type_in IN PLS_INTEGER)
RETURN BOOLEAN | Returns TRUE if the specified datatype code indicates a nonscalar datatype |
FUNCTION is_record_type (
type_in IN PLS_INTEGER)
RETURN BOOLEAN | Returns TRUE if the specified datatype code indicates a record TYPE |
FUNCTION is_rowtype (
type_in IN PLS_INTEGER,
type_subname_in in varchar2)
RETURN BOOLEAN | Returns TRUE if the specified datatype code indicates a record TYPE defined with %ROWTYPE |
FUNCTION in_same_family (
type1_in IN PLS_INTEGER,
type2_in IN PLS_INTEGER)
RETURN BOOLEAN | Returns TRUE if the two datatypes are members of the same family |
Table 2: An explanation of the code in Listing 5
|
Line(s) | Significance |
| 3 | Declare a nested table (the first I've used for OverloadCheck), to hold the list of datatype codes for a single family. |
| 5-6 | The implementations of the two local procedures I previously explained. |
| 8-14 | I call load_one_family to load up all the necessary permutations for the family of string datatypes. Notice that because I've decided to go with a nested table, I can use the constructor for that TYPE directly in my call to load_one_family. I don't have to declare a local variable to hold the collection. I simply pass the different type codes to the constructor; it returns the nested table; and then that table is passed on to the program, to load up all permutations in the type_families collection. |
| 15 | The ellipses (...) indicate repeated calls to load_one_family for each of the different families of datatypes that can produce ambiguous overloadings (such as date, timestamp, and number). See the source code for full details. |
|