Ensuring Data Quality

Ensuring Data Quality

Oracle Warehouse Builder provides a number of sophisticated data quality features.

The Name and Address operator identifies and corrects errors and inconsistencies by comparing input data to data libraries that are supplied by third party name-and-address-cleansing software vendors.

The Match and Merge operator identifies and merges potential duplicate records. The Match-Merge feature provides the capability to apply complex match and merge rules.

Topics

This lesson will discuss the following:

Overview
Prerequisites
Examining the Match/Merge operator
Executing the Mapping
Summary

This lesson will take approximately 45 minutes to complete.

Move your mouse over this icon to show all screenshots. You can also move your mouse over each individual icon to see only the screenshot associated with it.


Overview

This lesson examines the Match Merge operator. You will use the existing mapping MAP_MATCH_MERGE_CUSTOMER_STG. This lesson highlights the power of the Match Merge functionality to identify and merge duplicate records.

Back to Topic List

Back to Topic List

In order for this lesson to work successfully, you will need to have performed the following:

1.

Completed Overview lesson.

2.

Completed Logging in to OWB lesson.

3.

Completed Importing Source Metadata lesson.

4.

Completed Defining Target Module lesson.

5.

Completed Dimensional Design using Oracle Warehouse Builder lesson.

6 .

Completed Designing ETL Data Flow Mapping lesson.

7 .

Completed Deploying Targets lesson.

 

Examining Match Merge operator in a Mapping

Back to Topic List

Unlike the Name and Address operator the match and merge feature is independent of any third party software. In this lesson we will examine an existing mapping that uses the Match and Merge operator.

The goal of the mapping is to load MAIL_MERGE_CUSTOMERS_STG table to identify which customers will be targeted during the next marketing campaign. Obviously, duplicates, or likely duplicates, should be removed and also, multiple customers that share the same address should only receive the information once. Warehouse Builder provides all this functionality in a very powerful operator Match-Merge.

1.

In the OWB10G_DEMO project, expand Databases, Oracle,STG, Mappings. Double click the MAP_MATCH_MERGE_CUSTOMER_STG mapping.

Move your mouse over this icon to see the image

 

2.

The selected mapping opens in the Mapping Editor.

Move your mouse over this icon to see the image

This example uses 2 files (customers_system1_txt and customers_system2_txt) with the same structure.

It combines the two data sets in the SETOP operator using the UNION ALL operation.

 

 

3.

Right click the MATCH_MERGE operator and select Edit.

Move your mouse over this icon to see the image

In this example you are looking at a previously designed mapping. If you would build this mapping and bring in the Match-Merge operator from the Toolbox, then you would walk through a wizard guiding you through the steps that you will see on the tabs in the next steps.

 

4.

The Match-Merge Editor window opens. Click on the Groups tab.

Move your mouse over this icon to see the image

The match merge operator has 3 groups: INGRP1 containing the input attributes,
MERGE containing the merged records (usually that means fewer records than INGRP1)
and XREF, which is the cross reference group that contains all records from INGRP1
including some of the merged attributes.

 

5.

Click on the Input Connections tab.

Move your mouse over this icon to see the image

During the creation of the mapping, the input was selected from the SETOP output. The input attributes in MATCH_MERGE match the output attributes of SETOP.

 

6.

Click on the Input Attributes tab.

Move your mouse over this icon to see the image

Review the attribute data types and lengths. In general if you go through the wizard, you do not need to change any of these, because Warehouse Builder pre-populates those based on the output attributes.

 

7.

Click on the Merge Output tab.

Move your mouse over this icon to see the image

The Merge Output shows which attributes will appear in the MERGE group. In this example the merge is based on name and address information and assign an ID.

 

8.

Click on the Cross-Reference Output tab.

Move your mouse over this icon to see the image

The cross-reference output shows the information from the incoming attributes as well as the ID from the matched set (MM_ID represents the matched ID). Warehouse Builder displays the merged attributes in Source Attributes with a default prefix of "MM_". This example cleanses the source systems, knowing the ID of the person a record was merged with during the Match-Merge operation.

 

9.

Click on the Match Bins tab.

Move your mouse over this icon to see the image

The concept of Match Bins is used to identify potential matches. For example, in this exercise we use CITY, STATE and COUNTRY as match bins, which means that only records for which these attributes match are merge candidates. Records with different values for CITY, STATE and COUNTRY will definitely be considered different. Obviously, in complex environments you want to make a smart choice for the match bins, because the more attributes you select, the faster the system will run.

 

10.

Click on the Match Rules tab.

Move your mouse over this icon to see the image

The match rules expose rules you can create to identify how rows can be matched.

 

11.

Select the MA_NAME rule if not already selected. Click on Person under the Rule Type column. The Rule Type in the upper panel exposes different types, each with their own characteristics as you see in the drop down list.

Move your mouse over this icon to see the image

In this step the Rule Type Person has been selected. The bottom right panel shows some of the attributes as well as the mandatory role the attribute plays.

 

12.

Ensure that the MA_NAME rule is selected. Click on the Details tab. Stretch the window and stretch the bottom panel.

Move your mouse over this icon to see the image

The Rule Options enable you how to identify matches. Obviously, with the number of Rule Types, roles and role options there is a nearly infinite number of ways to match and merge records. Again, the functionality is not restricted to name and address information.

 

13.

Click on the MA_ADDRESS rule. Notice the attributes and the roles in the bottom right panel. Click on Details tab on the bottom left panel. Again you see there is a multitude of Rule Options that can be defined for an address merge.

Move your mouse over this icon to see the image

Move your mouse over this icon to see the image

 

14.

In the Match-Merge Editor click on the Merge Rules tab. Click on Details tab in the bottom left, rule Minimum is selected. Click OK or Cancel to close the Match-Merge Editor.

Move your mouse over this icon to see the image

In this step you specify how sets of attributes will be merged. Merge rules can be defined on an attribute-by-attribute basis, but also for a set of attributes at once (called a record). In this case we do not have a preference for which record or for which attributes in the record have a preference.

 

Executing the Mapping

Back to Topic List

In this topic you will learn how to execute a mappng that is successfully deployed.

 

1.

Close the Mapping Editor.

In the OWB client select Project > Deployment Manager. In the Select Runtime Repository Connection dialog from the Connection drop-down list choose the OBE_RUNTIME_CONNECTION or use <the runtime connection that you created in Lesson 7> and click OK. In the Connection Information dialog, for password type connect_user or use <the password that you used for the connect as user account> and click OK.

Expand TARGET_LOC, expand STG and select mapping MAP_MATCH_MERGE_CUSTOMERS_STG.

Move your mouse over this icon to see the image

Move your mouse over this icon to see the image

Move your mouse over this icon to see the image

 

2.

Before you generate this mapping ensure that the 2 external tables (STG.CUSTOMERS_SYSTEM1_EXT and STG.CUSTOMERS_SYSTEM2_EXT) are successfully deployed.

In the TARGET_LOC location, expand the STG module, expand External Tables. Select CUSTOMERS_SYSTEM1_EXT hold CTRL key on the kepboard and select CUSTOMERS_SYSTEM2_EXT. Click Default Action. Generate and Deploy these two External Tables.

The two tables (MAIL_MERGE_CUSTOMERS_STG and CLEANUP_CUSTOMERS_STG) must be successfully deployed.

In the Deployment Manager window click the MAP_MATCH_MERGE_CUSTOMERS_STG mapping and click Default Action.

Move your mouse over this icon to see the image

 

3.

In the Deployment Manager from the toolbar click Generate/Deploy. Click Commit in th Confirm Commit dialog.

Move your mouse over this icon to see the image

 

4.

In the Pre-Deployment Generation results window click Deploy if the mapping was successfully generated.

Move your mouse over this icon to see the image

 

5.

In Deployment results window click OK.

Move your mouse over this icon to see the image

 

6.

In the Deployment Manager right click MAP_MATCH_MERGE_CUSTOMERS_STG and select Execute.

Move your mouse over this icon to see the image

 

7.

In the Execution Parameters dialog click Execute.

Move your mouse over this icon to see the image

Notice 280 records were inserted into the cleanup table whereas only 185 records were identified as mail merge customers.

Move your mouse over this icon to see the image

Close the Execution Results window and the OWB Deployment manager.

 

 

Summary

Back to Topic List

Warehouse Builder provides powerful data quality features. Name and Address cleansing can be used to perform name and address cleansing, validation and standardization. The Match and Merge functionality is extremely powerful when it comes down to identifying and merging potential duplicate records.

After performing the above steps you observe that for marketing campaign 95 (280-185) out of 280 candidate addresses as being duplicate (34%). Leveraging these features can save a lot of money and provide a much more consistent view to customers.

Move your mouse over this icon to hide all screenshots

 

 

 

E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy