|
|
Strip Words |
The Strip Words transformation processor removes any occurrences of words that match a Reference Data list from attribute values.
Strip Words can be used to remove extraneous words from attributes, often with a view to creating values for matching. For example, when matching companies using a Company Name field, it may be useful to remove less significant words that occur in various forms, or which may occur in some values and not others, such as LTD, LIMITED, UK, PLC and so on.
Any String or String Array type attributes from which you wish to strip words. Number and Date attributes are not valid inputs.
Note that if you input an Array attribute, the transformation will apply to all array elements, and an Array attribute will be output.
|
Option |
Type |
Purpose |
Default Value |
|
Reference data |
Reference Data |
The list of words that you wish to strip from attribute values. |
None |
|
Delimiters |
Reference Data |
Provides a way of specifying a standard, reusable set of delimiter characters for breaking up value into words, and allows you to use control characters as delimiters. Note that only single characters (not strings of characters) can be used as delimiters. Multi-character delimiters will be ignored. |
*Delimiters |
|
Delimiters list |
Free text entry |
Allows you to specify delimiters to use without having to create reference data for simple delimiters such as space or comma. Note that if these are used in addition to a reference list, all delimiters from both options will be used to break up the data. |
Space |
|
Ignore case? |
Yes/No |
Drives whether or not to ignore case when matching the list of words to strip. |
Yes |
|
Data attribute |
Type |
Purpose |
Value |
|
[Attribute Name].StrippedWords |
Derived |
A new attribute with any matching words stripped. |
The original attribute value, with any words that matched your reference list stripped out. The original delimiters used in the input value will be preserved. |
None
|
Execution Mode |
Supported |
|
Batch |
Yes |
|
Real time Monitoring |
Yes |
|
Real time Response |
Yes |
The Strip Words transformer presents no summary statistics on its processing.
In the Data view, each input attribute is shown with its new derived attribute with words stripped to the right.
None
In this example, Strip Words is used to remove less significant words such as 'Limited', 'Ltd.', 'Services' and 'Associates' from a field containing Company Names:
Oracle ® Enterprise Data Quality Help version 9.0
Copyright ©
2006,2011 Oracle and/or its affiliates. All rights reserved.