You are here: Processor Library > Profiling > Data Types Profiler

Data Types Profiler

The Data Types Profiler analyzes the content of a number of attributes in order to assess whether or not the values conform to a consistent data type (that is, text, number or date).

Use

Use the Data Types Profiler to gain an understanding of the types of data found in each attribute in your data, to assess whether the type of data is consistent, and in order to find values where the data type may be incorrect - for example because data was entered in the wrong field, or with the wrong type of data type constraint.

The Data Types Profiler analyzes for three basic types of data:

Null values are counted separately from the above.

Configuration

Inputs

Any attributes that you wish to analyze for data type consistency.

Options

Option

Type

Purpose

Default Value

List of recognized date formats

 

Reference Data (Date Formatting Category)

Recognizes dates in a variety of different formats

*Date Formats (See Note below)

Note on Date Formats

The Date Formats Reference Data used by the Data Type Check must conform to the standard Java 1.5.0 SimpleDateFormat API.

To understand how to add Reference Data entries for the correct recognition of dates, see the online Java documentation here.

Note: The valid date format yyyyMMdd, which is included in the date format reference data, is not recognized by this processor. This is because it contains no alpha characters or separators, and so cannot be distinguished from an eight-digit number.

Outputs

Data attributes

None

Flags

None

Execution

Execution Mode

Supported

Batch

Yes

Real time monitoring

Yes

Real time response

 

Yes (See note below)

Note: The Data Types Profiler produces a percentage consistency statistic, which is calculated on the set of records input to the processor. In a real time monitoring process, this set is limited by the configurable commit point on the reader (defined as a number of transactions or as as time limit). If a process with a Data Types Profiler is executed as a real time response process, processing records 1 by 1, this consistency measure will always be 100%.

Results Browsing

In addition to the number of records analyzed, the following statistics are available in the Results Browser for each attribute:

Statistic

Meaning

Text

The number of values that were recognized as having a textual format

Date

The number of values that were recognized as having a date format

Number

The number of values that were recognized as having a number format

% Consistency

A calculation of the consistency of the data types in each attribute - that is, the percentage of values that were recognized as matching the most common data type.

Example

In this example, the Data Types Profiler is run on all attributes in a table of Customer records:

Oracle ® Enterprise Data Quality Help version 9.0
Copyright © 2006,2011 Oracle and/or its affiliates. All rights reserved.