Matching and Merging Records

The goal of this tutorial is to match and merge a set of customer records into a single customer master table.

Note: The screenshots in this tutorial may appear slightly different from yours depending on the order in which you perform the other tutorials in this OBE Series. You can ignore the irrelevant items you see on your screens.

Approximately 1 hour

Topics

This tutorial covers the following topics:

Overview
Prerequisites
Summary

Place the cursor over this icon to load and view all the screenshots for this tutorial. (Caution: This action loads all screenshots simultaneously, so response time may be slow depending on your Internet connection.)

Note: Alternatively, you can place the cursor over each individual icon in the following steps to load and view only the screenshot associated with that step.

Overview

Warehouse Builder delivers record matching and merging capabilities to the ETL space in the core product. Matching and merging allows you to use business rules to detect potential matches between two or more records. These matching records can then be merged into a single master record. To ensure proper auditing and tracing of these changes, a cross reference table is available tracking all merges into the master record.

In this tutorial you map multiple customer record sets and merge them into a single customer master table. The goal is to load the MAILMERGE_CUSTOMERS_STG table to identify which customers will be targeted during the next marketing campaign.

Duplicates, or likely duplicates, should be removed. In addition, multiple customers that share the same address should only receive the information once.

Back to Topic List

Before you perform this tutorial, you should:

1.

Complete the Setting Up the Oracle Warehouse Builder Project tutorial.

Back to Topic List

Creating a Match-Merge Mapping

The first step is to identify what source data need to be merged. Perform the following steps to create a Match-Merge Mapping:

1.

Assuming that you logged in as OWB user, from the main Warehouse Builder console, expand HANDSON > Databases > Oracle > DQ_TGT. Right-click on Mappings and select New.

 

2.

Enter CUST_MM_MAP for the name and click OK.

 

3.

The mapping editor will open. In the top left Explorer panel, expand Databases > Oracle > DQ_SRC > Tables. Drag and drop CUST_SYS1 into the mapping canvas.

 

4.

Drag and drop CUST_SYS2 into the mapping canvas.

 

5.

Maximize the operators by clicking the maximize icon for each table and click Auto-Layout in the toolbar.

 

6.

In the Palette on the lower left side, drag a SET OPERATION operator to the right of the two source tables.

 

7.

Drag a line from the CUST_SYS1.INOUTGRP1 into SET_OPERATION.INGRP1.

 

8.

Collapse SET_OPERATION.INGRP1 and maximize the SET_OPERATION operator.

 

9.

Drag a line from the CUST_SYS2.INOUTGRP1 into SET_OPERATION.INGRP2.

 

10.

The result is shown. Collapse SET_OPERATION.INGRP2.

 

Back to Topic List

You can now create a Match-Merge operator to identify the criteria for which the data is evaluated. Perform the following steps:

1.

Drag and drop the Match-Merge operator from the palette to the right of the SET_OPERATION.

 

2.

The Match-Merge wizard launches. On the Welcome page, click Next.

 

3.

Accept the default name and click Next.

 

4.

The Match-Merge operator has 3 groups:

INGRP1

Contains input attributes.

MERGE

Contains the merged records (usually that means fewer records than INGRP1).

XREF

Cross reference groups contain the link between the original and merged data sets. This is the tracking mechanism used when a merge is performed.

Click Next.

 

5.

Expand SET_OPERATION and select OUTGRP1, then select > to move all the attributes under OUTGRP1 to the list of INGRP1 attributes under Mapped Attributes.

 

6.

Click Next.

 

7.

Review the attribute data types and lengths. In general, if you go through the wizard, you don't need to change any of these values because Warehouse Builder pre-populates them based on the output attributes. Click Next.

 

8.

The Merge Output chooses which attributes appear in the MERGE group (the cleansed group). Select the following Source Attributes: ID, FIRST_NAME, LAST_NAME, STREET_ADDRESS, ZIP_CODE, CITY, STATE, and COUNTRY. Then click > to move them to the list of Output Attributes. Then click Next.

 

9.

Shift-Select from ID to MM_ID and click > to move the selected attributes to the right. MM_ID will be the link between the merged records and the original record. This is what the "merged into" record is for the original record with the identifier ID.

 

10.

Click Next.

 

11.

Match Bins are used to identify potential matches. Match merge bins limit the number of potential matches in a data set, thus improving performance of the match algorithm. In this example, you use CITY, STATE and COUNTRY as a match bin, which means that only records for which these attributes match are merge candidates. Select INGRP1.CITY, INGRP1.STATE, and INGRP1.COUNTRY from the list of Available Attributes and click > to move them to the right.

 

12.

Click Next.

 

13.

Click Add to add a new match rule to the wizard.

 

14.

The match rule determines how you identify a set of records as matching. Change the name to MA_NAME and select Person as the Rule Type.

 

15.

In the list of attributes in the bottom panel of the wizard, select FIRST_NAME and LAST_NAME and click >. For the Roles, select First name Standardized for FIRST_NAME and Last Name for LAST_NAME. Then click the Details tab.

 

16.

For the Rule Options, under First Name, check Match on substrings and Similarity score. Change the Similarity score to 90 using Edit Distance algorithm.

You have finished defining the MA_NAME match rule. Now you need to define one more rule. In the top window, click Add.

 

17.

Change the name to MA_ADDRESS and select Address as the Rule Type.

 

18.

In the list of attributes in the bottom panel of the wizard, select STREET_ADDRESS, CITY and STATE and click >. For the Roles, select Primary Address, City and State respectively. Then click the Details tab.

 

19.

Make sure that Address Line Similarity is checked, the similarity is set to 90 and Edit Distance is the algorithm. Then click Next.

 

20.

Click Add to add a new merge rule to the wizard.

 

21.

Change the name to ME_ID and select Min Max Record as the Rule Type.

Upon choosing the Rule Type, the Attribute field will contain RECORD and the list of Attributes will appear in the lower pane.

 

22.

You want to specify how records will be merged. Merge rules can be defined on an attribute-by-attribute basis, but also for a set of attributes at once (called a record). Select ID and click >. Then click the Details tab.

 

23.

Select ID as the Selecting attribute and Minimum as the Attribute relation. These values determine which matching records to merge on. Then click Next.

 

24.

If you expand the match-merge operators, you can see the output. Click Finish.

The match-merge operator has been created.

 

Back to Topic List

Now that you have defined what the Match-Merge criteria are, you need to define where the data goes when the Match-Merge is executed. Perform the following steps to map the merged output to a target table:

1.

In the Explorer Panel of the Mapping Editor, expand Oracle > DQ_TGT > Tables. Drag and drop the MAILMERGE_CUSTOMERS_STG table to the right of the MATCHMERGE operator.

 

2.

Map the MERGE group into the MAILMERGE_CUSTOMERS_STG table operator by dragging a line from the MERGE Group header to the INOUTGRP1 of the MAILMERGE_CUSTOMERS_STG operator.

 

3.

Select Match by name of source and target attributes then click Go.

 

4.

The list of attributes that will be matched are shown. Click OK.

 

5.

Drag and drop the CLEANUP_CUSTOMERS_STG table to the right of the MATCHMERGE operator.

 

6.

Map the XREF group into the CLEANUP_CUSTOMERS_STG table operator by dragging a line from the XREF Group header to the INOUTGRP1 of the CLEANUP_CUSTOMERS_STG operator.

 

7.

Select Match by name of source and target attributes then click Go.

 

8.

The list of attributes that will be matched are shown. Click OK.

 

9.

Validate your mapping. Select Mapping > Validate from the menus.

 

10.

You should receive a validation successful message in the Generation Results window. Close the Mapping Editor.

 

11.

Save your work. Select Design > Save All then click Yes.

 

Back to Topic List

You are now ready to generate the code for the match-merge mapping and deploy it into the database. Perform the following steps:

1.

In the Design Center, select Tools > Control Center Manager.

 

2.

Expand DQ_TGT_LOCATION, then expand and highlight DQ_TGT and click the Default Actions button to set the objects for deployment.

 

3.

Verify the objects that are going to be created. Select File > Deploy > To Control Center.

 

4.

Monitor the deployment job in the job panel. You will see three separate stages (when the icon changes). When the job finishes successfully, you will see the green check box.

 

Back to Topic List

You can now run the mapping to actually merge the data into your tables. Perform the following steps:

1.

Expand the Mapping node under DQ_TGT, right-click on CUST_MM_MAP and select Start.

 

2.

Monitor the execution of the mapping in the job panel. When you receive the successfully message, double-click on the CUST_MM_MAP job in the Control Center Jobs panel.

 

3.

Click the Execution Results tab.

 

4.

Notice that 280 records were inserted into the cleanup table whereas only 185 records were identified as mail merge customers. What this means is that 95 out of 280 candidate addresses were duplicates or invalid (34%). When done reviewing, close this dialog. Then close the Control Center Manager.

 

Back to Topic List

You can view the merged output. Perform the following steps:

1.

In the Design Center, under DQ_TGT > Tables, right-click on the CLEANUP_CUSTOMERS_STG table and select Data...

 

2.

Your data may look slightly different however, you can see that the MM_ID points back to the records after merging keeping the link between the original record and the new records ensuring compliance of your cleansing.

 

Back to Topic List

In this tutorial, you learned how to:

Create a match-merge mapping

Create a match-merge operator using the match-merge wizard

Map the merged output

Deploy, execute and view the merged output

Back to Topic List

Place the cursor over this icon to hide all screenshots.