You are here: Processor Library > Profiling > Character Profiler

Character Profiler

The Character Profiler discovers all the distinct characters that exist in a number of text attributes, and how often they occur.

Use

The Character Profiler is particularly useful to find unexpected characters in text attributes that may need to be checked for on an ongoing basis (using Invalid Character Check), removed (using Denoise) or replaced (using Character Replace). Normalizing character discrepancies is also useful before Parsing.

The results are created so that they can easily be added to Reference Data for any of the above purposes.

Also, where a source of data contains records from a number of different countries, the Character Profiler can help to understand the ranges of characters in the data.

Configuration

Inputs

Any String attributes that you wish to search for character instances.

Options

None

Outputs

Data attributes

None

Flags

None

Execution

Execution Mode

Supported

Batch

Yes

Real-time Monitoring

Yes

Real-time Response

Yes

Results Browsing

The Character Profiler produces a summary view of its results, showing the following statistics:

Statistic

Meaning

Character

The character found in the data (See Note below)

Decimal

The decimal Unicode character reference. Note that a hash character is used to prefix the character references, so that the references can be used directly in Reference Data.

Hex

The hexadecimal Unicode character reference. Note that #x is used to prefix the character references, so that the references can be used directly in Reference Data.

Total

The total number of occurrences of the character across the selected input attributes.

Record Count

The number of records containing the character in any of the selected input attributes.

[Attribute name] Total

The number of occurrences of the character in the attribute.

[Attribute name] Record Count

The number of records containing the character in the attribute.

Note: If you see a square character in the Results Browser, this is likely to be because you do not have the required fonts installed on your client to view the actual character, or (in rare cases) because the fonts are installed but require custom font.properties files to be correctly rendered in OEDQ and other Java applications. If you copy and paste the character into another application (such as Microsoft Excel) and still cannot see it correctly (in this case, it will typically be represented by a ?), you do not have the required fonts installed.

Example

In this example, The Character Profiler is used to find unusual characters in some multi-language data from a Unicode database. The user chooses to look at the low frequency characters first by sorting the results by the Total column (ascending):

Oracle ® Enterprise Data Quality Help version 9.0
Copyright © 2006,2011 Oracle and/or its affiliates. All rights reserved.