Contained Attributes Profiler |
The Contained Attributes Profiler searches records across a number of attributes for pairs of attributes where one attribute value often contains the other attribute's value. A threshold option is used to drive whether or not to relate pairs of attributes together, depending on the percentage of records where one attribute value contains the other.
Use the Contained Attributes Profiler to find attributes which are, or should be, related. Where there is strong attribute linkage, this may indicate a potentially redundant attribute.
Alternatively, attributes may be supposed to be related, but that relationship may be broken; that is, one column value may be blank but could be derived from another column's value.
Any attributes that you wish to examine for contained attribute linkage.
|
Option |
Type |
Purpose |
Default Value |
|
Contained attribute threshold % |
Percentage |
Controls the percentage of values that must match using Contains matching in two attributes for those two attributes to be considered as related, and to appear in the results. |
80% Note that the value must be between 50% and 100% inclusive. |
|
Ignore case? |
Yes/No |
Controls whether or not case will be ignored when checking if one attribute value contains another. |
Yes |
None
None
|
Execution Mode |
Supported |
|
Batch |
Yes |
|
Real time Monitoring |
Yes |
|
Real time Response |
No |
The Contained Attributes Profiler requires a batch of records to produce its statistics; that is, in order to find meaningful relationships between pairs of attributes, it must run to completion. Therefore, its results are not available until the full data set has been processed, and this processor is not suitable for a process that requires a real time response.
When executed against a batch of transactions from a real time data source, it will finish its processing when the commit point (transaction or time limit) configured on the Read Processor is reached.
The Contained Attributes Profiler provides a summary view of any pairs of attributes that have a high enough percentage of related values, where one attribute value often contains the other. The top-level view shows the following statistics for each pair of related attributes:
|
Statistic |
Meaning |
|
Contained |
The number of records where the values for both the related attributes were the same. |
|
Not contained |
The number of records where the values for the related attributes were not the same. |
Additional Data
Click on the Additional Data button to display the above statistics as percentages of the records analyzed.
Drill-down on the number of records where the pair of attributes matched exactly to see a breakdown of the frequency of occurrence of each matching value. Drill-down again to see the records.
Alternatively, drill-down on the number of records where the pair of attributes were not equal to see the records directly. If there should be a relationship between attributes, these will be the records where the relationship is broken.
In this example, a number of attributes are checked for a Contains relationship. A relationship is found between the FirstName and EmailAddress attributes, where the FirstName is often contained in the EmailAddress:
Summary View
Drilling down on the 1829 records where the EmailAddress contains the FirstName attribute reveals the following view of all the distinct pairs of records where the relationship was found:
Drilldown on related records
Oracle ® Enterprise Data Quality Help version 9.0
Copyright ©
2006,2011 Oracle and/or its affiliates. All rights reserved.