RegEx Split |
The RegEx Split processor provides a way to split up the data in an attribute into an array, using a regular expression to define where the splits should occur.
Regular expressions are a standard technique for expressing patterns and manipulating Strings that are very powerful once mastered.
Tutorials and reference material about regular expressions are available on the Internet, including:
and in books, including:
There are also software packages available to help you master regular expressions, such as RegExBuddy, and online libraries of useful regular expressions, such as RegExLib.
Use RegEx Split to split up data where you need a more advanced way of splitting up the data than using delimiters. For example, you may wish to separate the data where one of a set of characters occurs, or a variable length of a set of characters occurs.
A single String attribute.
|
Option |
Type |
Purpose |
Default Value |
|
Regular expression |
Regular expression |
The regular expression to be used as a delimiter to split the data |
None |
|
Data attribute |
Type |
Purpose |
Value |
|
RegExSplit |
Derived |
A new Array attribute with the result of the RegEx Split |
The result of the RegEx split. Note that the data that matched the regular expression itself acts as a delimiter, and so does not appear in the array. |
|
Flag attribute |
Purpose |
Possible Values |
|
RegExSplitSuccess |
To indicate whether the RegEx Replace was successful or not |
Y/N |
|
Execution Mode |
Supported |
|
Batch |
Yes |
|
Real time Monitoring |
Yes |
|
Real time Response |
Yes |
The RegEx Split processor produces a summary view of its results, showing the following statistics:
|
Statistic |
Meaning |
|
Success |
The number of records which were split using the regular expression. |
|
Failure |
The number of records which were not split using the regular expression. |
The following output filters are available from the RegEx Split processor:
In this example, RegEx Split is used to split data from a Notes attribute on an Employees table either side of a person's initials (2 or 3 upper case characters found in a sequence).
Regular expression: ([A-Z]{2,3})
Results (successful splits):
Oracle ® Enterprise Data Quality Help version 9.0
Copyright ©
2006,2011 Oracle and/or its affiliates. All rights reserved.