Using the SQL Query Node in a Workflow

Overview

Purpose

This tutorial covers the use of the new SQL Query Node in an Oracle Data Miner 4.0 workflow.

Time to Complete

Approximately 15 mins.

Scenario

This lesson addresses a business problem that can be solved by applying a Classification model. In our scenario, ABC Company wants to predict which customers will have a high life-time value (LTV), based on certain demographic data and other input attributes.

In this lesson, you create a workflow that combines data that is produced from a SQL Query node with data that is defined by a normal data source node. The joined data is then fed to a classification model to generate the predictive results.

The completed workflow looks like this:

Software Requirements

The following is a list of software requirements:

Have access to or have installed:
- Oracle Database 12c Enterprise Edition, Release 12.1 with Advanced Analytics Option.
- The Oracle Database sample data, including the unlocked SH schema.
- SQL Developer 4.0

Prerequisites

Before starting this tutorial, you should have:

Set up Oracle Data Miner for use within Oracle SQL Developer 4.0. If you have not already set up Oracle Data Miner, complete the lesson: Setting Up Oracle Data Miner 4.0
Completed the lesson: Using Oracle Data Miner 4.0

Create a Data Miner Project

Using Feature Selection and Generation with GLM

Build the Data Miner Workflow

To create a Data Miner Project, perform the following steps:

In the Data Miner tab, right-click dmuser and select **New Project**, as shown here:

In the Create Project window, enter a project name (in this example SH Schema) and then click OK.

Build the Data Miner Workflow

As discussed in the "Using Oracle Data Miner 4.0" tutorial, a Data Miner Workflow is a collection of connected nodes that describe a data mining processes.

Data Mining Scenario

In this topic, you build a workflow that:

Joins customer demographic data (from a database table) with aggregated sales data for each customer (geneated by a SQL query).
Feeds the joined data to a classification build node that is designed to predict which customers are most likely to join the Affinity Card program.

Required Task

Before creating the workflow, save RFM-SQL.txt to your local machine. This text document contains a SQL query that you will use for the SQL Query node in the workflow. Make a note of the saved location on your local machine.

To create the workflow for this process, perform the following steps.

Create a Workflow and Add Data Sources

Right-click the SH Schema project and select **New Workflow** from the menu.

In the Create Workflow window, enter **Predicting Customer LTV** as the name and click OK.

Next, add a Data Source node to the workflow.

A. In the Components tab, open the **Data** category.

B. Drag and drop the **Data Source** node onto the Workflow pane.

Result: A Data Source node appears in the Workflow pane and the Define Data Source wizard opens.

In Step 1 of the wizard:

A. Click **Add Schemas**, beneath the Available Tables/Views list, as shown here:

Select the **SH.SUPPLEMENTARY_DEMOGRAPHICS** table and click **Finish** in the wizard.

Next, drag and drop a **SQL Query** node from the Data group to the workflow, just underneath the Customer Data node, like this:

Join the Data

In this topic, you join the customer demographics data to the aggregated sales data defined in the SQL Query node.

Follow these steps:

To begin, add a Join node to the workflow and connect the two data nodes to the join node.

A. Open the **Transforms** group in the Components tab.

B. Drag and drop a **Join** node onto the workflow, like this:

Next, double-click the **Join** node to display the Edit Join Node window.

A. In the Join tab, click the **Add** (green "+" icon), like this:

Create Classification Models

In this topic, you add a Classification node to the workflow, like you did in the Using Oracle Data Miner 4.0 tutorial. However, in this scenario, you will remove two of the default algorithms from the Class Build node and define only the Decision Tree and SVM models.

Then, in the next topic, you will build the two classification models and compare the results.

Follow these steps:

Add a Classification node to the Workflow, and connect the Join node to it.

A. First, expand the **Models** category in the Components tab. Then drag and drop the **Classification** node to the Workflow pane, like this:

Select the **CLAS_GLM_#_#** and **CLAS_NB_#_#** model settings. Then click the **Remove** tool (red "x" icon), as shown below. (Select **Yes** in the warning message window.)

In the Edit Classification Build Node window:

A. Select **AFFINITY_CARD** as the Target attribute.

B. Select **CUST_ID** as the Case ID attribute.

Next, select the **Input** tab.

A. De-select the **Determine inputs automatically** option, as shown here:

In the workflow, rename the Class Build node to **Predicted High LTV Customers**, as shown here:

Build and Compare the Models

In this topic, you build the two classification models against the joined source data. Then, you examine the model results.

Right-click the Predicted High LTV Customers node and select **Run** from the pop-up menu.

Notes:

When the node runs it builds and tests all of the models that are defined in the node.

As before, a green gear icon appears on the node borders to indicate a server process is running, and the status is shown at the top of the workflow window.

When the build is complete, all nodes contain a green check mark in the node border.

Right-click the Class Build node and select **Compare Test Results** from the menu, like this:

Next, select the **Performance Matrix** tab.

A. Select the SVM model.

Next, right-click the Predicted High LTV Customers node and select **View Models > CLAS_DT_1_#**.

Result: A Decision Tree display window opens with the model name. The display should look something like this:

Navigate to and select **Node 2**, which is after the second split for a prediction of "1" ("Yes" for the Affinity Card.)

Dismiss the Decision Tree display tab as shown here:

Click Save All to save the workflow.

Summary

In this lesson, you learned how to use the SQL Query node in a workflow. Specifically, you:

Joined customer demographic data (from a database table) with aggregated sales data, as defined in a SQL Query node.
Fed the joined data to a Classification Model.
Examined the predictive results.

Resources

To learn more about Oracle Data Mining:

See the Oracle Data Mining and Oracle Advanced Analytics pages on OTN.
Refer to additional OBEs in the Oracle Learning Library
See the Data Mining Concepts manuals:
- Oracle Database 12c Release 1 (12.1)
- Oracle Database 11g Release 2 (11.2)

Credits

Lead Curriculum Developer: Brian Pottle

Other Contributors: Charlie Berger, Mark Kelly, Margaret Taft, Kathy Talyor

To help navigate this Oracle by Example, note the following:

Hiding Header Buttons:: Click the Title to hide the buttons in the header. To show the buttons again, simply click the Title again.
Topic List Button:: A list of all the topics. Click one of the topics to navigate to that section.
Expand/Collapse All Topics:: To show/hide all the detail for all the sections. By default, all topics are collapsed
Show/Hide All Images:: To show/hide all the screenshots. By default, all images are displayed.
Print:: To print the content. The content currently displayed or hidden will be printed.

To navigate to a particular section in this tutorial, select the topic from the list.