You are here: Processor Library > Profiling > Record Duplication Profiler

Record Duplication Profiler

The Record Duplication Profiler allows you to find records that are exact duplicates of one another, based on the selected attributes.

Use

Use the record duplication profiler to check if there are any records in the data set that have been entirely duplicated - for example due to a error in data migration.

As you can select the attributes to use in the duplicate check, you can choose to find records that are duplicates based on a subset of the total record - for example, customer records that are duplicates by name, address, and postcode.

Configuration

Inputs

Any attributes that you want to use in the duplicate check.

Options

Option

Type

Purpose

Default Value

Consider no data values as duplicates?

Yes/No

Determines whether or not records that have Null values in all attributes will be considered as duplicates of one another.

Yes

Ignore case?

Yes/No

Determines whether or not the duplication analysis should ignore case.

Yes

Note:Records that have Null values in some, but not all, attributes, and which exactly match other records, will always be considered as duplicates.

Outputs

Data attributes

None

Flags

Flag attribute

Purpose

Possible Values

RecordDuplicate

Indicates which attributes are duplicated elsewhere

Y/N

Execution

Execution Mode

Supported

Batch

Yes

Real time Monitoring

Yes

Real time Response

No

The Record Duplication Profiler assesses duplication across a batch of records. It must therefore run to completion before its results are available, and is not suitable for a process that requires a real time response.

When executed against a batch of transactions from a real time data source, it will finish its processing when the commit point (transaction or time limit) configured on the Read Processor is reached. The statistics returned will indicate the number of duplicates in the batch of transactions only.

Results Browsing

The Record Duplication Profiler produces a summary view of its results, showing the following statistics:

Statistic

Meaning

Duplicated

The number of records that are duplicated across the attributes analyzed

Not duplicated

The number of records that are not duplicated across the attributes analyzed

Example

In this example, the Record Duplication Profiler finds duplicates in a Customers table using two attributes - ADDRESS1 and ADDRESS2

Summary View

Drilldown on records with Duplicated values:

Oracle ® Enterprise Data Quality Help version 9.0
Copyright © 2006,2011 Oracle and/or its affiliates. All rights reserved.