Using Oracle Data Miner 4.1

Before You Begin

Purpose

This tutorial covers the use of Oracle Data Miner 4.1 to perform data mining activities against Oracle Database 12c Release 12.1.0.2. Oracle Data Miner 4.1 is included as an extension of Oracle SQL Developer, version 4.1. In this lesson, you learn how to use Data Miner to create a classification model in order to solve a business problem.

Oracle SQL Developer is a free graphical tool for database development. With SQL Developer, you can browse database objects, run SQL statements and SQL scripts, and edit and debug PL/SQL statements. With SQL Developer, version 4.1, you can use the Oracle Data Miner against Oracle Database 12c.

Time to Complete

Approximately 45 mins

Background

Data mining is the process of extracting useful information from masses of data by extracting patterns and trends from the data. Data mining can be used to solve many kinds of business problems, including:

Predict individual behavior, for example, the customers likely to respond to a promotional offer or the customers likely to buy a specific product (Classification)
Find profiles of targeted people or items (Classification using Decision Trees)
Find natural segments or clusters (Clustering)
Identify factors more associated with a target attribute (Attribute Importance)
Find co-occurring events or purchases (Associations, sometimes known as Market Basket Analysis)
Find fraudulent or rare events (Anomaly Detection)

The phases of solving a business problem using Oracle Data Mining are as follows:

Problem Definition in Terms of Data Mining and Business Goals
Data Acquisition and Preparation
Building and Evaluation of Models
Deployment

Problem Definition and Business Goals

When performing data mining, the business problem must be well-defined and stated in terms of data mining functionality. For example, retail businesses, telephone companies, financial institutions, and other types of enterprises are interested in customer “churn” – that is, the act of a previously loyal customer in switching to a rival vendor. The statement “I want to use data mining to solve my churn problem” is much too vague. From a business point of view, the reality is that it is much more difficult and costly to try to win a defected customer back than to prevent a disaffected customer from leaving; furthermore, you may not be interested in retaining a low-value customer. Thus, from a data mining point of view, the problem is to predict which customers are likely to churn with high probability, and also to predict which of those are potentially high-value customers.

Data Acquisition and Preparation

A general rule of thumb in data mining is to gather as much information as possible about each individual, then let the data mining operations indicate any filtering of the data that might be beneficial. In particular, you should not eliminate some attribute because you think that it might not be important – let ODM’s algorithms make that decision. Moreover, since the goal is to build a profile of behavior that can be applied to any individual, you should eliminate specific identifiers such as name, street address, telephone number, etc. (however, attributes that indicate a general location without identifying a specific individual, such as Postal Code, may be helpful.) It is generally agreed that the data gathering and preparation phase consumes more than 50% of the time and effort of a data mining project.

Building and Evaluation of Models

The Workflow creation process of Oracle Data Miner automates many of the difficult tasks during the building and testing of models. It’s difficult to know in advance which algorithms will best solve the business problem, so normally several models are created and tested. No model is perfect, and the search for the best predictive model is not necessarily a question of determining the model with the highest accuracy, but rather a question of determining the types of errors that are tolerable in view of the business goals.

Deployment

Oracle Data Mining produces actionable results, but the results are not useful unless they can be placed into the correct hands quickly. The Oracle Data Miner user interface provides several options for publishing the results.

Scenario

This lesson focuses on a business problem that can be solved by applying a Classification model. In our scenario, ABC Company wants to identify customers who are most likely to purchase insurance.

Note: For the purposes of this tutorial, the "Data and Acquisition" phase has already been completed, and the sample data set contains all required data fields. Therefore, this lesson focuses primarliy on the "Building and Evaluation of Models" phase.

Context

Before starting this tutorial, you should have set up Oracle Data Miner for use within Oracle SQL Developer 4.1, by using the previous tutorial in this suite.

What Do You Need?

Have access to or have Installed the following:

Oracle Database: Minimum: Oracle Database 12c Enterprise Edition, Release 1.0.2 (12.1.0.2.0) with the Advanced Analytics Option.
The Oracle Database sample data, including the SH schema.
SQL Developer 4.1

Create a Data Miner Project

Before you create a Data Miner Project and build a Data Miner workflow, it is helpful to organize the Data Miner interface components within SQL Developer to provide simplified access to the necesary Data Miner features.

To begin, close all of the SQL Developer interace elements (which may include the Connections tab, the Reports tab, and others), and leave only the Data Miner tab open, like this:

As shown above, the data miner user (dmuser) has been created and a SQL Developer connection has been established. In the Setting Up Oracle Data Miner 4.1 tutorial, you learn how to create a database account and SQL Developer connection for a data mining user named dmuser. This user has access to the sample data that you will be mining.

Note: If the Data Miner tab is not open, select Tools > Data Miner > Make Visible from the SQL Develper main menu.

In the Data Miner tab, right-click dmuser and select New Project, as shown here:
In the Create Project window, enter a project name (in this example ABC Insurance) and then click OK.

Note: You may optionally enter a comment that describes the intentions for this project. This description can be modified at any time.

Result: The new project appears below the data mining user connection node.

Next, you will learn how to build a workflow for the classification model.

Build A Data Mining Workflow

A Data Miner Workflow is a collection of connected nodes that describe a data mining processes.

A workflow:

Provides directions for the Data Mining server. For example, the workflow says "Build a model with these characteristics." The model is built by the data mining server with the results returned to the workflow.
Enables you to interactively build, analyze, and test a data mining process within a graphical environment.
May be used to test and analyze only one cycle within a particular phase of a larger process, or it may encapsulate all phases of a process designed to solve a particular business problem.

What Does a Data Miner Workflow Contain?

Visually, the workflow window serves as a canvas on which you build the graphical representation of a data mining process flow, like the one you are going to create, shown here: