By Connor McDonald | June 2021
[This article was first published in Oracle Magazine in 2016 and has been updated for the latest version of Oracle Database. —Ed.]
My first programming job was working with COBOL in a mainframe environment in the 1980s. In those days, dealing with the case of strings was a simple affair. All our company’s code was in uppercase; all the screens presented information in uppercase; and all the reports dispatched to the noisy impact printers in the server room were printed in uppercase. It was a simple and easy-to-implement strategy. Uppercase was the only case.
As the years went on, this rudimentary approach needed to evolve as the technologies and architectures upon which we built solutions demanded a more natural feel to the display of textual data. Nowadays, users expect their content to resemble the grammatical norms of the modern world, and that includes the representation of names and proper nouns within the database and the applications sitting on that data. There are two conflicting requirements: allowing data entry in an application to be flexible in terms of the use of case and presenting that data in a consistent fashion when rendering it for output to the application.
This article will show how to use PL/SQL and SQL to address the case-management issue. By extension, you can use these techniques to handle other programmatic challenges faced by application developers.
Applications can enforce the validation of data, but an over-reliance on front-end applications enforcing data validation can reveal inconsistencies with the case of text, as shown in the following:
SQL> select surname
2 from names;
SURNAME
———————
jones
brown
SMITH
The stored data can be patched for correctness, but that obviously does not address the root cause of how the inconsistency was permitted and whether it should be permitted. Therefore, a commonly adopted compromise is to correct the data as it is read back from the database, as shown in the following:
SQL> select initcap(surname)
2 from names;
INITCAP(SURNAME)
————————————————
Jones
Brown
Smith
In many cases this is a sufficient solution, but there are some names (including that of this article’s author) that are not correctly normalized by the INITCAP
function. The following shows what happens when INITCAP
encounters a name such as McDonald:
SQL> select initcap(surname)
2 from names;
INITCAP(SURNAME)
————————————————
Jones
Brown
Smith
O'Brien
Mcdonald
The INITCAP
function is unaware of the various linguistic idiosyncrasies associated with names and does not capitalize the D
in McDonald
. Similarly, the same functionality of INITCAP
that capitalizes the first letter following an apostrophe and lets it correctly handle names such as O'Brien
creates an issue when INITCAP
is used to normalize other proper nouns that may contain apostrophes. The following INITCAP
example demonstrates possibly inappropriate formatting of possessive names, such as some common company names:
SQL> select initcap(organization_name)
2 from companies;
INITCAP(ORGANIZATION_NAME)
——————————————————————————
Oracle Corporation
McDonald'S Restaurant
Jackson'S Supplies
That’s not good.
If INITCAP
does correctly meet your requirements, you can extend the existing implementation of INITCAP
with your own version. Listing 1 presents a PL/SQL function example, called MY_INITCAP
, that caters to boundary conditions in the data, while falling back onto the existing INITCAP
implementation for normal cases.
Listing 1. INITCAP extensions added via a PL/SQL wrapper
SQL> create or replace
2 function MY_INITCAP(p_string varchar2) return varchar2 is
3 l_string varchar2(1000) := p_string;
4 begin
5 if lower(l_string) like 'mac%' then
6 l_string := 'Mac'||initcap(substr(l_string,4));
7 elsif lower(l_string) like 'mc%' then
8 l_string := 'Mc'||initcap(substr(l_string,3));
9 end if;
10
11 if l_string like '''%' then
12 null;
13 else
14 if not regexp_like(l_string,'(Mac[A-Z]|Mc[A-Z])') then
15 l_string := initcap(l_string);
16 end if;
17 if l_string like '_''S%' then
18 null;
19 else
20 l_string := replace(l_string,'''S','''s');
21 end if;
22 end if;
23
24 return l_string;
25 end;
26 /
Function created.
SQL> select my_initcap(surname)
2 from names;
MY_INITCAP(SURNAME)
———————————————————
Jones
Brown
Smith
O'Brien
McDonald
SQL> select my_initcap(organization_name)
2 from companies;
MY_INITCAP(ORGANIZATION_NAME)
—————————————————————————————
Oracle Corporation
McDonald's Restaurant
Jackson's Supplies
Building a PL/SQL function such as MY_INITCAP
that is bulletproof is not a trivial undertaking. While all the data presented so far was correctly normalized, it is easy to find boundary cases that cause the custom function to fail, as shown in the following code:
SQL> select my_initcap(problem_data)
2 from potential_problems;
MY_INITCAP(PROBLEM_DATA)
————————————————————————
Mcdonald's Restaurant
MacY's
In this case, the combination of two boundary conditions, McDonald and a trailing S, created problems, and the rendering of Macy’s demonstrates that a blanket rule of intercepting strings that are prefixed with Mc or Mac is too simplistic an approach.
You could sample more and more of your existing data and extend the MY_INITCAP
function to improve the successful conversion ratio, but as the function grows and becomes more complex, the potential of the function to have an impact on query performance grows as well.
Oracle Database’s built-in INITCAP
function, being part of the highly optimized database server’s kernel, is accessible directly from the SQL engine, whereas the customized MY_INITCAP
function is a PL/SQL function. Calling PL/SQL functions from SQL statements incurs some overhead. The overhead of a single execution of the function is negligible, but the overhead increases significantly if high execution counts are part of an application’s requirements. The following code tests the performance of the built-in INITCAP
function versus the PL/SQL version when each is called 2 million times from a SQL statement:
SQL> set timing on
SQL> select count(initcap(object_name)) from two_million_names;
COUNT(INITCAP(OBJECT_NAME))
———————————————————————————
2000000
1 row selected.
Elapsed: 00:00:00.54
SQL> select count(my_initcap(object_name)) from two_million_names;
COUNT(MY_INITCAP(OBJECT_NAME))
——————————————————————————————
2000000
1 row selected.
Elapsed: 00:00:11.24
That’s quite a difference, yes; but developers, testers, and DBAs alike can become overzealous when interpreting performance tests like these, dismissing the viability of PL/SQL functions as too slow. Consequently, blanket and too-broad rules such as “you shall make no PL/SQL calls from SQL” are sometimes created in coding standards within enterprises.
While it is true that some of the functionality of the MY_INITCAP
function can be engineered into a pure SQL solution as shown in the following code, that would sacrifice the functionality, maintainability, and modularity of having a PL/SQL solution:
SQL> select
2 case
3 when lower(surname) like 'mac%'
4 then 'Mac'||initcap(substr(surname,4))
5 when lower(surname) like 'mc%'
6 then 'Mc'||initcap(substr(surname,3))
7 when surname like '''%' then surname
8 when initcap(surname) like '_''S%' then surname
9 else replace(initcap(surname),'''S','''s')
10 end my_initcap
11 from names;
MY_INITCAP
———————————
Jones
Brown
Smith
McDonald
In other words: The performance of code is only one facet of its suitability as a solution to an enterprise requirement.
Over the past decade, the distinction between SQL and PL/SQL in Oracle Database has continually been reduced, allowing for a new type of PL/SQL functions: functions bound solely to the SQL statement within which they are intended to be executed.
If the MY_INITCAP
PL/SQL function exists solely to be called within a single SQL statement, Oracle Database allows me to embed that function directly within the SQL statement itself and not introduce clutter into the list of compiled PL/SQL objects stored within the database.
Whereas the pure SQL solution required convoluted CASE
statements, embedding a PL/SQL function within a SQL statement retains the modularity and flexibility of PL/SQL without sacrificing the functionality or maintainability of the code, as shown in the following code:
SQL> WITH
2 function my_initcap(p_string varchar2) return varchar2 is
3 l_string varchar2(1000) := p_string;
4 begin
5 if lower(l_string) like 'mac%' then
6 l_string := 'Mac'||initcap(substr(l_string,4));
7 elsif lower(l_string) like 'mc%' then
8 l_string := 'Mc'||initcap(substr(l_string,3));
9 end if;
...
23
24 return l_string;
25 end;
26 select my_initcap(surname)
27 from names;
MY_INITCAP(SURNAME)
———————————————————
Jones
Brown
Smith
McDonald
The PL/SQL embedded within the SQL statement is not limited to a single function. Multiple functions and procedures are permitted, although the PL/SQL called from the final SQL statement must be a function returning a value. The following code shows how MY_INITCAP
has been segmented to improve code readability by adding a routine, IS_SCOTTISH
, to handle Mac and Mc prefixes:
SQL> WITH
2 function is_scottish(p_string varchar2) return boolean is
3 begin
4 return regexp_like(p_string,'(Mac[A-Z]|Mc[A-Z])');
5 end;
6 function my_initcap(p_string varchar2) return varchar2 is
7 l_string varchar2(1000) := p_string;
8 begin
9 if is_scottish(l_string) then
...
28 end;
29 select my_initcap(surname)
30 from names;
You can also test whether calling embedded PL/SQL from within SQL yields any performance improvements. To do this, here is a very simple standalone function within the database that returns the value 1
. The following code shows how long it takes to call that function 1 million times from a SQL statement:
SQL> create or replace
2 function F return number is
3 begin
4 return 1;
5 end;
6 /
Function created.
SQL> select sum(f)
2 from
3 ( select level from dual
4 connect by level <= 1000 ),
5 ( select level from dual
6 connect by level <= 1000 );
Elapsed: 00:00:02.04
Compare this result to the result of completing the identical task with the PL/SQL function defined within the SQL statement.
SQL> with
2 function f1 return number is
3 begin
4 return 1;
5 end;
6 select sum(f1)
7 from
8 ( select level from dual
9 connect by level <= 1000 ),
10 ( select level from dual
11 connect by level <= 1000 )
12 /
Elapsed: 00:00:00.42
In Oracle Database, defining PL/SQL functions directly within the SQL statement from which they are executed improves the speed of switching between the SQL engine and the called PL/SQL function. You no longer need to complicate your code by converting it to pure SQL syntax—and you retain the inherent modularity and maintainability of the PL/SQL language.
But what about existing production applications that call PL/SQL functions from SQL statements? Do they need refactoring to migrate each function to a WITH
clause for each SQL statement? No. But if you have existing PL/SQL functions that augment the capabilities of your application’s SQL statements, you can take advantage of the performance improvement.
In Oracle Database, a user-defined function (UDF) pragma is available to signify to the PL/SQL compiler that a function will be used primarily from the SQL engine.
The following code repeats the performance test with my simple function, this time with the UDF pragma added in line 3:
SQL> create or replace
2 function F return number is
3 pragma udf;
4 begin
5 return 1;
6 end;
7 /
SQL> select sum(f)
2 from
3 ( select level from dual
4 connect by level <= 1000 ),
5 ( select level from dual
6 connect by level <= 1000 )
7 ;
Elapsed: 00:00:00.39
The performance is equivalent to running the function in the WITH
clause of a SQL statement.
The close functional integration between the PL/SQL and SQL engines is a great boon for developers building data-centric applications. However, there has been some resistance to realizing the benefits of the modular, maintainable code achievable with PL/SQL because of performance concerns. With user-defined functions in Oracle Database, developers can attain those benefits as well as superior application and database performance.
Illustration: Wes Rowell
Connor McDonald is a former Oracle ACE Director who has joined the Oracle Developer Advocates team. Over the past 25 years, he has worked with systems in Australia, the United Kingdom, Southeast Asia, Western Europe, and the United States. He has coauthored three books and has been a popular speaker at Oracle conferences around the world, specializing in topics regarding the database engine and PL/SQL.