You are here: Processor Library > Transformation > Generate Initials

Generate Initials

The Generate Initials processor transforms values into their initials, for example to transform "Bayerische Motoren Werke" to "BMW".

Use

The Generate Initials transformation is most commonly used to match data (or cluster records for matching) where both the abbreviated, and non-abbreviated forms of names (or other terms) are used. It is useful in order to find matches such as "International Business Machines" and "IBM", which are hard for a computer to match without first initializing each value. An option is included to ensure that short 'words' such as "IBM" are not initialized to "I".

Configuration

Inputs

Any String or String Array type attributes that you wish to convert to initials. Number and Date attributes are not valid inputs.

Note that if you input an Array attribute, the transformation will apply to all array elements, and an Array attribute will be output.

Options

Option

Type

Purpose

Default Value

Delimiters Reference Data

Reference Data

Allows the use of a standard set of characters that are used to split up words before generating initials.

*Delimiters

Delimiter characters

Free text

Specifies an additional set of characters that are used to split up words before generating initials.

Space

Ignore upper case single words of length

Integer

Allows the Generate Initials processor to leave alone any single word values (that is, where no word splits occurred) of up to a number of characters in length, and which are all upper case (for example, 'IBM')

See note below.

4

Note: Normally, the Generate Initials transformation simply ignores the case of the original value, and generates upper case initials for each separate word it finds, as separated by the specified delimiters. For example, the values "A j Smith", "ALAN JOHN SMITH" and "Alan john smith" are all initialized as "AJS". However, there may be some values which are already initialized, for example, "PWC", "IBM", "BT", which should not be further initialized to "P", "I" and "B" respectively.

These can be distinguished by the fact that they are:

a. single word values,  

b. already in upper case, and

c. only a few characters in length.

The Ignore upper case single words of length option allows you to specify a length of word (in characters) below or equal to which you do not wish to initialize single upper case word values.

For example, if set to 4, the values "PWC, "BT", "RSPB" and "IBM" would be ignored during the initialization process as they are 4 characters or less in length, are single word values, and are already upper case. By contrast, "IAN JOHN SMITH" would still be initialized to "IJS", as although the word "IAN" is less than 4 characters in length, and is already upper case, it is not a single word value. Also, "RSPCA" would be initialized to "R" as it is over 4 characters in length.

Outputs

Data attributes

Data attribute

Type

Purpose

Value

[Attribute Name].initials

Derived

A new attribute with the initialized values.

The original attribute value, converted to initials.

Flags

None

Execution

Execution Mode

Supported

Batch

Yes

Real time Monitoring

Yes

Real time Response

Yes

Results Browsing

The Generate Initials transformer presents no summary statistics on its processing.

In the Data view, each input attribute is shown with its new derived initialized attribute to the right.

Output Filters

None

Example

In this example, the Generate Initials transformation is used to transform company names into their initialized values, using the default configuration, that is:

Delimiters Reference Data: not used

Delimiters: space

Ignore upper case single words of length: 4

Note that 'BMW' is not initialized to 'B' as it is a single upper case word with only 3 characters, so is assumed to represent initials already.

Oracle ® Enterprise Data Quality Help version 9.0
Copyright © 2006,2011 Oracle and/or its affiliates. All rights reserved.