You are here: Processor Library > Profiling > Equal Attributes Profiler

Equal Attributes Profiler

The Equal Attributes Profiler searches records across a number of attributes for pairs of attributes where values are frequently equal - for example where FirstName and GivenName attributes are both stored, and normally the same.  A threshold option is used to drive whether or not to relate pairs of attributes together, depending on the percentage of values in each attribute that have the same value.

Use

Use the Equal Attributes Profiler to find possibly redundant attributes, or pairs of attributes where values are normally equal, but in some cases are not. The Equal Attributes Profiler can help find bad data where two values in related attributes do not relate to each other when they should.

Configuration

Inputs

Any attributes that you wish to examine for equal attribute linkage.

Options

Option

Type

Purpose

Default Value

Equal attribute threshold

Percentage

Controls the percentage of values that must be equal in two attributes for those two attributes to be considered as related, and to appear in the results.

80%

Note that the value must be between 50% and 100% inclusive.

Treat nulls as equal?

Yes/No

Controls whether or not pairs of Null values are considered to be equal, and therefore whether or not they will be considered when appraising the Equal attribute threshold (above).

Yes

Outputs

Data attributes

None

Flags

None

Execution

Execution Mode

Supported

Batch

Yes

Real time Monitoring

Yes

Real time Response

No

The Equal Attributes Profiler requires a batch of records to produce its statistics; that is, in order to find meaningful relationships between pairs of attributes, it must run to completion. Therefore, its results are not available until the full data set has been processed, and this processor is not suitable for a process that requires a real time response.

When executed against a batch of transactions from a real time data source, it will finish its processing when the commit point (transaction or time limit) configured on the Read Processor is reached.

Results Browsing

The Equal Attributes Profiler provides a summary view of any pairs of attributes that have a high enough percentage of equal values. The top-level view shows the following statistics for each pair of related (equal) attributes:

Statistic

Meaning

Equal

The number of records where the values for both the related attributes were the same.

Null pairs

The number of records where the values for both the related attributes were null.

Note:If the option to treat nulls as equal is selected, this will be zero, as the null pairs will be included in the Equal statistic.

Not equal

The number of records where the values for the related attributes were not the same.

Additional Data

Click on the Additional Data button to display the above statistics as percentages of the records analyzed.

Drill-down on the number of records where the pair of attributes matched exactly to see a breakdown of the frequency of occurrence of each matching value. Drill-down again to see the records.

Alternatively, drill-down on the number of records where the pair of attributes were not equal to see the records directly. If there should be a relationship between attributes, these will be the records where the relationship is broken.

Example

In this example, a Customer table is analyzed to see if any of its attributes are commonly equal to each other, using the default configuration. The Equal Attributes Profiler finds that the DT_PURCHASED and DT_ACC_OPEN attributes are normally equal:

Summary View

By drilling down on the number of records where the two fields were equal, you can see a view of all the pairs of equal values:

Oracle ® Enterprise Data Quality Help version 9.0
Copyright © 2006,2011 Oracle and/or its affiliates. All rights reserved.