Oracle9i Warehouse Builder Data Quality FAQ

ORACLE9i Warehouse Builder 9.2

Data Quality


1.0 Name and Address Cleansing
2.0 Match-Merge
3.0 Other Information

 

1.0 Name and Address Cleansing

1.1 What is Name and Address Cleansing?
Name and Address Cleansing is the group of transformations performed on data containing individual and business names as well as domestic and foreign addresses for the purpose of improving the quality of data. Such transformations are usually referred to as parsing, standardization, correction and augmentation:

  • Name/address parsing is the breakdown of non-discrete input into discrete name or address components. 
  • Name/address standardization is the modification of components to a standard version acceptable to a postal service or suitable for record matching.
  • Postal correction involves matching an input address with postal database entries to verify and/or correct an address.
  • Augmentation adds derived information to the data, such as gender based on name, or collection of census and geo-location data

1.2 How is Name and Address Cleansing accomplished in Warehouse Builder?
Name and address cleansing in Warehouse Builder is a built-in transformation and is integrated into the ETL functionality as Name and Address Operator. Name and Address Operator offers the full range of transformations for name and address data: parsing, standardization, correction and augmentation. At the design time this wizard-driven operator is used to model Name and Address rules inside Warehouse Builder. At runtime, the deployed Name and Address transformations transparently access parsers and data provided by the third-party vendors.

1.3 What is Oracle9i Pure Name and Address? 
Oracle9i Pure Name and Address is the prior implementation of Name and Address cleansing, based on bundling a single third-party vendor software. Pure Name and Address ended with Warehouse Builder 9.0.4 and is no longer available.

1.4 What is the current architecture of Name and Address cleansing in Warehouse Builder? 
Warehouse Builder users now have a choice of multiple certified third-party data quality vendors. The use of Name and Address Operator at the design time does not change in any way. However, at runtime Warehouse Builder transparently accesses adaptors build by the third party vendors and the data they supply.

1.5 How are Name and Address components of Warehouse Builder installed?
Warehouse Builder Name and Address Server is installed automatically as part of Warehouse Builder server side install. Third-party adapters and data are installed separately by following the vendors’ own installation programs or scripts. Warehouse Builder becomes aware of their location through the use of environment variables and property files modified by third-party installation programs automatically.

1.6 How are Name and Address components of Warehouse Builder licensed?
Warehouse Builder Name and Address Server is included at no extra charge. Third party data quality vendors sell their software separately and directly to the customer.

1.7 Who are the third-party data quality vendors partnering with Warehouse Builder?
At present Warehouse Builder is certified to work with Trillium Software and First Logic. Other leading data quality vendors have partnered with Oracle and are in the process of building their adapters for Warehouse Builder.

1.8 How can the sales, consulting or product support of the third-party data quality vendors be contacted?
The data quality partner matrix is available at /products/warehouse/index.html. It lists contacts and is updated periodically.

1.9 What countries are supported?
The country support varies by vendor. The data quality partner matrix referenced above provides a list of each vendor’s supported countries.

1.10 What are the prices of third-party data quality software?
The prices obviously vary by vendor. Contact the vendor directly for pricing information. One of the benefits of Warehouse Builder working with multiple partners is the ability to offer different price points.

1.11 What about customers with current licenses for Oracle 9i Pure Name and Address?
These customers can continue using Pure Name and Address. Warehouse Builder 9.2 is certified to work with Oracle9i Pure Name and Address 9.0.4. They would need to perform slightly different installation steps as described in the 9.2 release notes. Alternatively, they can choose to take advantage of special offers from the data quality vendors and trade-in their Pure Name and Address licenses. Please contact the data quality partners directly for details.

1.12 How does Name and Address cleansing work in the Oracle 9i RAC environment?
Warehouse Builder supports Real Application Clusters (RAC) features of Oracle 9i. One way this support is accomplished is by allowing the install of OWB's runtime service on multiple nodes in a cluster. The benefits of such configuration are the transparent failover between nodes and the near linear increase of processing performance with each added node.

Taking advantage of OWB Name and Address Cleansing in the RAC environment requires installation of the following components on each node in a cluster:

  • OWB Name and Address Server - performed by "OWB Runtime Assistant", option "Register RAC Instance"
  • A third-party Name and Address adapter - performed by third-party installation programs
  • A third-party Name and Address set of data - performed by third-party installation programs

Back to Top

2.0 Match-Merge

2.1 What is Matching and Merging?
Matching is the process of determining, through business rules, which records refer to the same logical data. Merging is the business rules driven consolidation of the data from the matched set into a single record.

2.2 What are the Deduplication, Householding and Record Linking?
In the area of Data Quality, all three are instances of using Match-Merge for a particular purpose:

  • Deduplication is the process of matching and merging for the purpose of removing the duplicate records, especially customer related records. This contributes to achieving the single view of the customer.
  • Householding is the process of matching customers belonging to the same household, usually identified by the same address. Customer names are not merged, however, they are linked to the address that is stored once. The benefit of householding is the improved ability to understand and target customers.
  • Record linking is the more generic instance of householding. The records may need to be linked for purposes other than determining households, for example, linking business customers’ branches and subsidiaries to one parent entity.

2.2 How is Matching and Merging accomplished in Warehouse Builder?
Matching and Merging in Warehouse Builder is a built-in transformation and is integrated into the ETL functionality as Match-Merge Operator. Match-Merge Operator has a wizard driven interface with a powerful set of UI controlled matching and merging algorithms.

2.3 How is Warehouse Builder Match-Merge installed?
Match-Merge is inseparable from Warehouse Builder and is installed automatically like any other operator.

2.4 How is Warehouse Builder Match-Merge licensed?
Match-Merge is included with Warehouse Builder at no extra charge. There is no dependency on third-party software for Match-Merge.

2.5 What are fuzzy matching, deterministic matching and probabilistic matching?
Warehouse Builder employs the elements of the fuzzy logic and provides both deterministic and probabilistic matching algorithms.
  • In general, fuzzy logic resembles human reasoning in its use of approximate information and uncertainty to generate decisions. In relation to matching, the term is used loosely to describe the approach that relies on rules that are imprecise rather than precise and operates on data with boundaries that are not sharply defined.
  • Deterministic matching gives equal weight to different types of information a record may contain. For example, a deterministic approach might place equal reliance on a match between the names on two records or a match between two birth dates.
  • Probabilistic matching exploits the statistical probability that a match on particular items is more or less likely to indicate that the records are the same. For example, birth date information is subject to errors made by a mistake on a single digit, and the number of possible birth dates is relatively small. Names, in contrast, are more likely to be recognizable even if a single error is made. Probabilistic matching thus allows assigning appropriate weights to different attributes and then compares the total score to the threshold that defines a successful match.

2.6 What Match rule types does Warehouse Builder offer?

  • Person Rule. Built-in rules for matching names.  This rule requires users to set what data within the record make up a person’s name.  The rule offers different parameters that enable specific person name matching such as “Match on initials”, “Match hyphenated names”.
  • Firm Rule. Built-in rules for matching business names. This rule requires users to set what data within the record make up a firm name. The rule offers different parameters that enable specific firm matching such as “Strip noise words” (Inc, Corp), “Match on acronyms”.
  • Address Rule. Built-in rules for matching Addresses. This rule requires users to set what data within the record make up an address. The rule offers different parameters that enable specific address matching such as “Match either street or post office box”, “Match on blank secondary address”.
  • Conditional Rule. The user sets each column to be compared with a comparison algorithm to use.  Algorithms such as Exact, Standard Exact, Soundex, Similarity, Standard Similarity, Partial Name, Abbreviation, and Acronym are supplied. Each column must match when compared using the desired algorithm for two records to be considered a match. In other words, the comparisons are logically ‘AND’ed together.
  • Weight Rule. The user gives a weight to each column included in the comparison and an overall score that must be achieved for the match. Each column that has a weight is compared using a similarity algorithm. This rule implements the probabilistic matching approach.
  • Custom Match Rule. The users can write their own comparison algorithms desired for matching.  They can also reference one of the other match rule types to logically ‘AND’ or ‘OR’ them together to produce a unique rule type.

2.7 What Merge rule types does Warehouse Builder offer?
Once the records are matched, they can be merged based on the following rule types:

  • Any: This rule will pick any (first non-null) value of the match set for this attribute.
  • Rank: Rank the values of the match set for a particular attribute and attribute value, and use the associated attribute value of the record that satisfies the highest rank. For example, choose the record where ‘Is_Found’ flag is True.
  • Sequence: User can specify a database sequence for this rule. The nextval of the sequence will be used for the value.
  • Min/Max: User can specify an attribute and a relation to choose the record to be used as a source for the merge attribute. For example, choose the record with a greater date.
  • Copy: User can choose a value from a different previously merged value.
  • Custom: User can create a PL/SQL package function to select the merge value.

2.8 What are Match Bins?
Match Bins group records together prior to matching to limit the set of necessary comparisons to a manageable number. The user designates an attribute or attributes by which to group the data. When Warehouse Builder matches the rows, it compares each row with the subsequent row for all rows within the same group. This can greatly enhance performance since Warehouse Builder searches for matches only within the “match bins” and not throughout the entire data set. The number of comparisons performed grows with the size of the bin. The formula is n=(b*(b-1))/2, where n is number of comparisons, b is number of records in a bin. To match 5 records, Warehouse Builder must perform 10 comparisons. To match 50 records, Warehouse Builder must perform 1,225 comparisons. To match 500 records, Warehouse Builder must perform 124,750 comparisons.

2.9 What is the relationship between Name and Address cleansing and Match-Merge?
It is recommended to match the data after it has been cleansed through Name/Address. However, the data that is not Name/Address centric, such as product data, or the data that is already deemed clean can be processed by Match-merge alone.

Back to Top

3.0 Other Information
3.1 What are the differences between Warehouse Builder and Pure Integrate?

Pure Integrate has been desupported and all of its functionality has been migrated to Warehouse Builder as of this release 9.2. Oracle9i Warehouse Builder combines complete ETL, metadata management and Data Quality functionality into one tool and through the same User Interface.

3.2 Where can I learn more about Warehouse Builder Name & Address cleansing as well as Match-merge?

  • Oracle9i Warehouse Builder User Guide and Online Help
  • Data Quality White Paper
  • Name/Address and Match-Merge viewlets

Back to Top
 

 

 
 

Copyright © 2003, Oracle Corporation. All rights reserved. 

Contact Us | Legal Notices and Terms of Use | Privacy Statement

E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy