You are here: Processor Library > Transformation > Strip Words

Strip Words

The Strip Words transformation processor removes any occurrences of words that match a Reference Data list from attribute values.

Use

Strip Words can be used to remove extraneous words from attributes, often with a view to creating values for matching. For example, when matching companies using a Company Name field, it may be useful to remove less significant words that occur in various forms, or which may occur in some values and not others, such as LTD, LIMITED, UK, PLC and so on.

Configuration

Inputs

Any String or String Array type attributes from which you wish to strip words. Number and Date attributes are not valid inputs.

Note that if you input an Array attribute, the transformation will apply to all array elements, and an Array attribute will be output.

Options

Option

Type

Purpose

Default Value

Reference data

Reference Data

The list of words that you wish to strip from attribute values.

None

Delimiters

Reference Data

Provides a way of specifying a standard, reusable set of delimiter characters for breaking up value into words, and allows you to use control characters as delimiters.

Note that only single characters (not strings of characters) can be used as delimiters. Multi-character delimiters will be ignored.

*Delimiters

Delimiters list

Free text entry

Allows you to specify delimiters to use without having to create reference data for simple delimiters such as space or comma.

Note that if these are used in addition to a reference list, all delimiters from both options will be used to break up the data.

Space

Ignore case?

Yes/No

Drives whether or not to ignore case when matching the list of words to strip.

Yes

Outputs

Data attributes

Data attribute

Type

Purpose

Value

[Attribute Name].StrippedWords

Derived

A new attribute with any matching words stripped.

The original attribute value, with any words that matched your reference list stripped out. The original delimiters used in the input value will be preserved.

Flags

None

Execution

Execution Mode

Supported

Batch

Yes

Real time Monitoring

Yes

Real time Response

Yes

Results Browsing

The Strip Words transformer presents no summary statistics on its processing.

In the Data view, each input attribute is shown with its new derived attribute with words stripped to the right.

Output Filters

None

Example

In this example, Strip Words is used to remove less significant words such as 'Limited', 'Ltd.', 'Services' and 'Associates' from a field containing Company Names:

Oracle ® Enterprise Data Quality Help version 9.0
Copyright © 2006,2011 Oracle and/or its affiliates. All rights reserved.