Using Regular Expressions
Using Regular Expressions
In this tutorial, you learn how to use regular expression support.
This new feature is introduced in Oracle Database 10g.
Approximately 30 minutes
This tutorial covers the following topics:
Place the cursor over this
icon to load and view all the screenshots for this tutorial. (Caution: Because
this action loads all screenshots simultaneously, response time may be slow
depending on your Internet connection.)
Note: Alternatively, you can place the cursor over
each individual icon in the following steps to load and view only the screenshot
associated with that step.
In Oracle Database 10g,
you can use both SQL and PL/SQL to implement regular expression support. Regular
expressions are a method of describing both simple and complex patterns for
searching and manipulating. String manipulation and searching contribute to
a large percentage of the logic in a Web-based application. Usage ranges from
the simple (for example, find the word San Francisco in a specified text) to the complex
(for example, extract all URLs from the text) to the more complex (for instance, find all words in which
every second character is a vowel).
Oracle Database 10g
introduces support for regular expressions. The implementation complies with
the Portable Operating System for UNIX (POSIX) standard, issued by the Institute
of Electrical and Electronics Engineers (IEEE), for ASCII data-matching semantics
and syntax. Oracle's multilingual capabilities extend the matching capabilities
of the operators beyond the POSIX standard.
When coupled with native SQL, the use of regular expressions
makes it possible to perform powerful search and manipulation operations on
any data stored in an Oracle database. You can use this feature to easily solve
problems that would otherwise be very complex to program.
Back to Topic List
Before starting this tutorial, you should have:
Back to Topic List
Matching Mechanism
If you have a string aabcd
and you specify a search for a(b|c)d,
the search looks for a followed
by either b or c,
which is then followed by d.
Regular Expression: 'a(b|c)d' String to Match: 'aabcd'
|
| a |
a |
b |
c |
d |
Description |
Result |
| * |
|
|
|
|
Look for a
and succeed
|
Match |
| |
* |
|
|
|
Look for b
and fail |
No match |
| |
* |
|
|
|
Look for c
and fail, reset, and advance |
No match |
| |
* |
|
|
|
Look for a
and succeed
|
Match |
| |
|
* |
|
|
Look for b
and succeed; remember c as
alternative |
Match |
| |
|
|
* |
|
Look for d
and fail |
No match |
| |
|
* |
|
|
Look for c
as last remembered alternative and fail, reset, and advance |
No match |
| |
|
* |
|
|
Look for a
and fail, reset, and advance |
No match |
| |
|
|
* |
|
Look for a
and fail, reset, and advance
|
No match |
| |
|
|
|
* |
Look for a
and fail, reset, and advance
|
No match |
Given the string aabcd,
a(b|c)d does not match it.
To implement regular expression support in either SQL or PL/SQL,
you use a new set of functions. These functions are:
| Function Name |
Description |
| REGEXP_LIKE |
Similar to the LIKE
operator, but performs regular expression matching instead of simple pattern
matching |
| REGEXP_INSTR |
Searches for a given string for a regular
expression pattern and returns the position were the match is found |
| REGEXP_REPLACE |
Searches for a regular expression pattern
and replaces it with a replacement string |
| REGEXP_SUBSTR |
Searches for a regular expression pattern
within a given string and returns the matched substring |
Metacharacters
Metacharacters are special characters that have a special
meaning, such as a wild card character, a repeating character, a nonmatching
character, or a range of characters.
You can use several predefined metacharacter symbols in the
pattern matching with the functions.
| Symbol |
Description |
*
|
Matches zero or more occurrences
|
|
|
Alternation operator for specifying alternative matches
|
^/$
|
Matches the start of line and the end of line
|
[]
|
Bracket expression for a matching list matching any
one of the expressions represented in the list
|
[^exp]
|
If the caret is inside the bracket, it negates the expression.
|
{m}
|
Matches exactly m times
|
{m,n}
|
Matches at least m times but no more than n
times
|
[: :]
|
Specifies a character class and matches any character
in that class
|
\
|
Can have four different meanings: (1) stand for itself;
(2) quote the next character; (3) introduce an operator; (4) do nothing
|
+
|
Matches one or more occurrences
|
?
|
Matches zero or one occurrence
|
.
|
Matches any character in the supported character set
(except NULL)
|
()
|
Grouping expression (treated as a single subexpression)
|
\n
|
Backreference expression
|
[==]
|
Specifies equivalence classes
|
[..]
|
Specifies one collation element (such as a multicharacter
element)
|
Back to Topic List
The following examples demonstrate the use of regular
expression functions. Perform the following steps:
| 1. |
In a terminal window, change to the /home/oracle/wkdir
directory and start SQL*Plus.
Connect to Oracle with user ID oe/oe
and password oe/oe.
cd /home/oracle/wkdir
sqlplus oe/oe
|
| 2. |
Examine the syntax of the REGEXP_LIKE
function:
REGEXP_LIKE(srcstr, pattern [,match_option])
In this function:
srcstr: Search value
pattern: Regular expression
match_option: Option to change default matching. Can include one or more
of the following values:
'c': Use case-sensitive matching (default)
'i': Use case-insensitive matching
'n': Allow match-any-character operator
'm': Treat source string as multiple line
To locate all products with names containing SSP/S,
SSP/V, SSS/V,
or SSS/S in the PRODUCT_NAME
column from the PRODUCT_INFORMATION
table, execute the following script:
@relike.sql
The relike.sql
script contains the following SQL:
SELECT product_name
FROM oe.product_information
WHERE regexp_like (product_name, 'SS[PS]/[VS]');

|
| 3. |
The REGEXP_INSTR
function returns the position of a given pattern within a string. Examine
the syntax:
REGEXP_INSTR(srcstr, pattern [, position [, occurrence
[, return_option [, match_option]]]])
In this function:
position:
Search starting position
occurrence: Occurrence
to search for
return_option: Indicate
the start or end position of occurrence
match_option: Option to
change default matching. Can include one or more of the following values:
'c':
Use case-sensitive matching (default)
'i': Use case-insensitive
matching
'n': Allow
match-any-character operator
'm': Treat
source string as multiple line
To search the product names to find the location
of the first nonalphabetic character (regardless of whether it is uppercase
or lowercase), execute the following script:
@reinstr.sql
The reinstr.sql
script contains the following SQL:
COLUMN non_alpha FORMAT 9999999999 SELECT product_name, REGEXP_INSTR(product_name, '[^[:alpha:]]') non_alpha FROM oe.product_information ;

Note that [^[:<class>:]]
implies a character class and matches any character from within that class;
[:alpha:] matches any alphabetic
character. In this case, you are negating this expression by using ^.
|
| 4. |
The REGEXP_SUBSTR
function returns a given string based on a pattern of occurrence. Examine
the syntax:
REGEXP_SUBSTR(srcstr, pattern [, position
[, occurrence [, match_option]]])
In this function:
position:
Search starting position
occurrence: Occurrence
to search for
match_option: Option to
change default matching. Can include one or more of the following values:
'c': Use case-sensitive matching (default)
'i': Use case-insensitive
matching
'n': Allow
match-any-character operator
'm': Treat
source string as multiple line
To extract the e-mail names from the CUSTOMERS
table, extract only those e-mail names for customers located in Switzerland.
To do this, return the contents in the CUST_EMAIL
column that precedes the @ symbol for customers with an NLS_TERRITORY
of Switzerland. Execute the following script:
@resubstr.sql
The resubstr.sql
script contains the following SQL:
SELECT REGEXP_SUBSTR(cust_email, '[^@]+') FROM oe.customers WHERE nls_territory = 'SWITZERLAND' ;

Note that in this example, the result returns the first
substring that does not have the @ symbol.
|
| 5. |
The REGEXP_REPLACE
function returns a "replaced" substring from a given string.
Examine the syntax:
REGEXP_REPLACE(srcstr, pattern [,replacestr [, position
[, occurrence [, match_option]]]])
In this function:
position:
Search starting position
occurrence: Occurrence
to search for
replacestr: Character string
replacing pattern
match_option: Option to
change default matching. Can include one or more of the following values:
'c': Use case-sensitive matching (default)
'i': Use case-insensitive
matching
'n': Allow
match-any-character operator
'm': Treat
source string as multiple line
To return information from the CATALOG_URL
column in the PRODUCT_INFORMATION
table, you can perform a full scan on the column. But this would result
in hundreds of rows being returned, because it lists a specific HTML page location
within a number of catalog domains. However, in this example, you want
to find only the individual domain names themselves and not the lower-level
pages they contain. To find the domain names without all the extraneous
information, use the REGEXP_REPLACE
function. Execute the following script:
@rereplace.sql
The rereplace.sql
script contains the following SQL:
SELECT UNIQUE REGEXP_REPLACE (catalog_url, 'http://([^/]+).*', '\1') FROM oe.product_information ;

Here is an explanation of how the string was processed:
| http:// |
The expression starts by
looking for this string literal; there are no special metacharacters
here. |
| ([^/]+) |
Then, the expression looks for a series of characters
provided that they are "not" slash (/).
|
| .* |
The expression finishes by consuming the rest
of the string with this part of the expression.
|
| \1 |
The matching expression is replaced with backreference
1, which is whatever was matched between the first set of parentheses.
|
|
Back to Topic List
Regular expression functions support multilingual capabilities
and can be used in locale-sensitive applications. To combine the use of regular
expressions with Oracle's NLS Language feature, perform the following steps:
| 1. |
Execute the following script to find the product description
in the Portuguese language:
@multiport.sql
The multiport.sql
script contains the following SQL:
SELECT regexp_substr(to_char(translated_name), '^[a-z]+')
FROM oe.product_descriptions
WHERE language_id = 'PT'
AND translated_name like 'G%' ;
Note that the data is not displayed.

The ^ is outside the bracket, which means that you are
searching for any strings or substrings that start with any character
from a to z.
|
| 2. |
Perform the same query, but this time use the case-insensitive
'i' switched on. Execute
the following script:
@multiport2.sql
The multiport2.sql
script contains the following SQL:
SELECT regexp_substr(to_char(translated_name), '^[a-z]+', 1, 1, 'i')
FROM oe.product_descriptions
WHERE language_id = 'PT'
AND translated_name like 'G%' ;

|
| 3. |
The results are still incomplete because the returned
strings are trimmed as soon as a non-English character is encountered.
This is because the range [a-z]
is sensitive to NLS_LANGUAGE.
You thus need to set the NLS_LANGUAGE
parameter appropriately to return the complete results. Execute the following
query:
@multiport3.sql
The multiport3.sql
script contains the following SQL:
ALTER SESSION SET NLS_LANGUAGE=PORTUGUESE;
SELECT regexp_substr(to_char(translated_name), '^[a-z]+', 1, 1, 'i')
FROM oe.product_descriptions
WHERE language_id = 'PT'
AND translated_name like 'G%' ;

|
| 4. |
The final step is to view the results in both English
and Portuguese to ensure that the translation has taken place. Execute
the following script:
@multiport4.sql
The multiport4.sql
script contains the following SQL:
SELECT REGEXP_SUBSTR(i.product_name, '^[a-z]+', 1, 1, 'i') || ' = ' || regexp_substr(to_char(d.translated_name), '^[a-z]+', 1, 1, 'i') FROM oe.product_descriptions d, oe.product_information i WHERE d.language_id = 'PT' AND d.translated_name like 'G%' AND i.product_id = d.product_id ;
ALTER SESSION SET NLS_LANGUAGE=AMERICAN;

|
Back to Topic List
To use regular expressions in check constraints, perform the
following steps:
| 1. |
Adding a check constraint on the CUST_EMAIL
column of the CUSTOMERS
table ensures that only strings containing an @
symbol are accepted. Execute the following script:
@chkemail.sql
The chkemail.sql
script contains the following SQL:
INSERT INTO customers VALUES
(9999,'Christian','Patel',
cust_address_typ ('1003 Canyon Road','87501',
'Santa Fe','NM','US'),
phone_list_typ ('+1 505 243 4144'),'us','AMERICA','100',
'ChrisP+creme.com', 149, null, null, null, null, null) ;
Because there was no validation being performed, an
e-mail address not containing an @ symbol was accepted. Perform a rollback
before starting the next step.
ROLLBACK ;

|
| 2. |
Implement the constraint by executing the following
script:
@chkemail2.sql
The chkemail2.sql
script contains the following SQL:
ALTER TABLE customers
ADD CONSTRAINT cust_email_addr
CHECK(REGEXP_LIKE(cust_email,'@'))NOVALIDATE ;

|
| 3. |
Test the constraint by executing @chkemail.sql
again.

The check constraint is violated because the e-mail
address does not contain the required symbol. The NOVALIDATE
clause ensures that existing data is not checked.
|
| 4. |
Remove the constraint by executing the following script:
@chkemail3.sql
The chkemail3.sql
script contains the following SQL:
ALTER TABLE customers DROP CONSTRAINT cust_email_addr ;

|
Back to Topic List
In this tutorial, you've learned how to:
 |
Perform a variety of searches to access data
by using regular expressions |
 |
Access data in other languages by using regular
expressions |
Back to Topic List
 |
To learn more about
Oracle Database 10g, refer to additional OBEs on the OTN
Web site. |
 |
To ask a question about this OBE tutorial,
post a query on the OBE
Discussion Forum. |
Back to Topic List
Place the cursor on this icon to hide all screenshots.
|