Oracle Warehouse Builder provides a number of sophisticated
data quality features.
The Name and Address operator identifies and corrects errors
and inconsistencies by comparing input data to data libraries that are supplied
by third party name-and-address-cleansing software vendors.
The Match and Merge operator identifies and merges potential
duplicate records. The Match-Merge feature provides the capability to apply
complex match and merge rules.
This lesson will take approximately 45 minutes to complete.
Viewing Screenshots
Move your mouse over this icon to show
all screenshots. You can also move your mouse over each individual icon to see
only the screenshot associated with it.
Overview
This lesson examines the Match Merge operator. You will
use the existing mapping MAP_MATCH_MERGE_CUSTOMER_STG. This
lesson highlights the power of the Match Merge functionality to identify and
merge duplicate records.
Unlike the Name and Address operator the match and
merge feature is independent of any third party software. In this lesson we
will examine an existing mapping that uses the Match and Merge operator.
The goal of the mapping is to load MAIL_MERGE_CUSTOMERS_STG
table to identify which customers will be targeted during the next marketing
campaign. Obviously, duplicates, or likely duplicates, should be removed and
also, multiple customers that share the same address should only receive the
information once. Warehouse Builder provides all this functionality in a very
powerful operator Match-Merge.
1.
In the OWB10G_DEMO project, expand Databases,
Oracle,STG, Mappings.
Double click the MAP_MATCH_MERGE_CUSTOMER_STG mapping.
2.
The selected mapping opens in the Mapping Editor.
This example uses 2 files (customers_system1_txt
and customers_system2_txt) with the same structure.
It combines the two data sets in the SETOP
operator using the UNION ALL operation.
3.
Right click the MATCH_MERGE operator and select Edit.
In this example you are looking at a previously designed mapping. If
you would build this mapping and bring in the Match-Merge operator from
the Toolbox, then you would walk through a wizard guiding you through
the steps that you will see on the tabs in the next steps.
4.
The Match-Merge Editor window opens. Click on the Groups
tab.
The match merge operator has 3 groups: INGRP1 containing
the input attributes, MERGE containing the merged records (usually that means
fewer records than INGRP1)
and XREF, which is the cross reference group that contains
all records from INGRP1
including some of the merged attributes.
5.
Click on the Input Connections
tab.
During the creation of the mapping, the input was selected from the SETOP
output. The input attributes in MATCH_MERGE match the output attributes
of SETOP.
6.
Click on the Input Attributes tab.
Review the attribute data types and lengths. In general if you go through
the wizard, you do not need to change any of these, because Warehouse
Builder pre-populates those based on the output attributes.
7.
Click on the Merge Output
tab.
The Merge Output shows which attributes will appear in the MERGE group.
In this example the merge is based on name and address information and
assign an ID.
8.
Click on the Cross-Reference Output
tab.
The cross-reference output shows the information from the incoming attributes
as well as the ID from the matched set (MM_ID represents the matched ID).
Warehouse Builder displays the merged attributes in Source Attributes
with a default prefix of "MM_". This example cleanses the source
systems, knowing the ID of the person a record was merged with during
the Match-Merge operation.
9.
Click on the Match Bins tab.
The concept of Match Bins is used to identify potential matches. For
example, in this exercise we use CITY, STATE and COUNTRY as match bins,
which means that only records for which these attributes match are merge
candidates. Records with different values for CITY, STATE and COUNTRY
will definitely be considered different. Obviously, in complex environments
you want to make a smart choice for the match bins, because the more attributes
you select, the faster the system will run.
10.
Click on the Match Rules
tab.
The match rules expose rules you can create to identify how rows can
be matched.
11.
Select the MA_NAME rule if
not already selected. Click on Person under the Rule
Type column. The Rule Type in the upper panel exposes different types,
each with their own characteristics as you see in the drop down list.
In this step the Rule Type Person has been selected.
The bottom right panel shows some of the attributes as well as the mandatory
role the attribute plays.
12.
Ensure that the MA_NAME rule
is selected. Click on the Details tab. Stretch the window
and stretch the bottom panel.
The Rule Options enable you how to identify matches. Obviously, with
the number of Rule Types, roles and role options there is a nearly infinite
number of ways to match and merge records. Again, the functionality is
not restricted to name and address information.
13.
Click on the MA_ADDRESS rule.
Notice the attributes and the roles in the bottom right panel. Click on
Details tab on the bottom left panel. Again you see there
is a multitude of Rule Options that can be defined for an address merge.
14.
In the Match-Merge Editor click on the Merge
Rules tab. Click on Details tab in the bottom
left, rule Minimum is selected. Click OK
or Cancel to close the Match-Merge Editor.
In this step you specify how sets of attributes will be merged. Merge
rules can be defined on an attribute-by-attribute basis, but also for
a set of attributes at once (called a record). In this case we do not
have a preference for which record or for which attributes in the record
have a preference.
In this topic you will learn how to execute a mappng
that is successfully deployed.
1.
Close the Mapping Editor.
In the OWB client select Project>Deployment Manager. In the Select Runtime Repository
Connection dialog from the Connection drop-down list choose the OBE_RUNTIME_CONNECTION
or use <the runtime connection that you created in Lesson 7> and
click OK. In the Connection Information dialog, for password
type connect_user or use <the password that you used
for the connect as user account> and click OK.
Expand TARGET_LOC, expand STG and select
mapping MAP_MATCH_MERGE_CUSTOMERS_STG.
2.
Before you generate this mapping ensure that
the 2 external tables (STG.CUSTOMERS_SYSTEM1_EXT and STG.CUSTOMERS_SYSTEM2_EXT)
are successfully deployed.
In the TARGET_LOC location, expand the STG
module, expand External Tables. Select CUSTOMERS_SYSTEM1_EXT
hold CTRL key on the kepboard and select CUSTOMERS_SYSTEM2_EXT. Click
Default Action. Generate and Deploy these two External Tables.
The two tables (MAIL_MERGE_CUSTOMERS_STG and CLEANUP_CUSTOMERS_STG)
must be successfully deployed.
In the Deployment Manager window click the MAP_MATCH_MERGE_CUSTOMERS_STG
mapping and click Default Action.
3.
In the Deployment Manager from the toolbar
click Generate/Deploy. Click Commit in
th Confirm Commit dialog.
4.
In the Pre-Deployment Generation results window
click Deploy if the mapping was successfully generated.
5.
In Deployment results window click OK.
6.
In the Deployment Manager right click MAP_MATCH_MERGE_CUSTOMERS_STG
and select Execute.
7.
In the Execution Parameters dialog click Execute.
Notice 280 records were inserted into the cleanup table whereas only
185 records were identified as mail merge customers.
Close the Execution Results window and the OWB Deployment manager.
Warehouse Builder provides powerful data quality features.
Name and Address cleansing can be used to perform name and address cleansing,
validation and standardization. The Match and Merge functionality is extremely
powerful when it comes down to identifying and merging potential duplicate records.
After performing the above steps you observe that for marketing
campaign 95 (280-185) out of 280 candidate addresses as being duplicate (34%).
Leveraging these features can save a lot of money and provide a much more consistent
view to customers.
Move your mouse over this icon to hide all screenshots