You are here: Processor Library > Profiling > Character Profiler

Character Profiler

The Character Profiler discovers all the distinct characters that exist in a number of text attributes, and how often they occur.

Use

The Character Profiler is particularly useful to find unexpected characters in text attributes that may need to be checked for on an ongoing basis (using Invalid Character Check), removed (using Denoise) or replaced (using Character Replace). Normalizing character discrepancies is also useful before Parsing.

The results are created so that they can easily be added to Reference Data for any of the above purposes.

Also, where a source of data contains records from a number of different countries, the Character Profiler can help to understand the ranges of characters in the data.

Configuration

Inputs

Any String attributes that you wish to search for character instances.

Options

None

Outputs

Data attributes

None

Flags

None

Execution

Execution Mode	Supported
Batch	Yes
Real-time Monitoring	Yes
Real-time Response	Yes

Results Browsing

The Character Profiler produces a summary view of its results, showing the following statistics:

Statistic	Meaning
Character	The character found in the data (See Note below)
Decimal	The decimal Unicode character reference. Note that a hash character is used to prefix the character references, so that the references can be used directly in Reference Data.
Hex	The hexadecimal Unicode character reference. Note that #x is used to prefix the character references, so that the references can be used directly in Reference Data.
Total	The total number of occurrences of the character across the selected input attributes.
Record Count	The number of records containing the character in any of the selected input attributes.
[Attribute name] Total	The number of occurrences of the character in the attribute.
[Attribute name] Record Count	The number of records containing the character in the attribute.

Note: If you see a square character in the Results Browser, this is likely to be because you do not have the required fonts installed on your client to view the actual character, or (in rare cases) because the fonts are installed but require custom font.properties files to be correctly rendered in OEDQ and other Java applications. If you copy and paste the character into another application (such as Microsoft Excel) and still cannot see it correctly (in this case, it will typically be represented by a ?), you do not have the required fonts installed.

Example

In this example, The Character Profiler is used to find unusual characters in some multi-language data from a Unicode database. The user chooses to look at the low frequency characters first by sorting the results by the Total column (ascending):