|
Solving Business Problems with Oracle Data Mining
Solving Business Problems with Oracle Data Mining
This tutorial shows you how to use Oracle Data Mining to solve
business problems.
Approximately 2 hours
This tutorial covers the following topics:
Place the cursor over this icon to load and view all the
screenshots for this tutorial. (Caution: This action loads all screenshots simultaneously,
so response time may be slow depending on your Internet connection.)
Note: Alternatively, you can place the cursor over an individual icon in the following steps to load and view only the screenshot associated with that step. You can hide an individual screenshot by clicking it.
Data Mining is sometimes called Knowledge Discovery-its goal
is to provide actionable information, not found by other means. This information
can improve the functioning of your business. For example, suppose a marketing
campaign results in a 2% positive response. If Data Mining can help focus the
campaign on the people most likely to respond, resulting in a 3% response, then
the business outcome is a 50% increase in revenue.
Data Mining can be divided into two types of "Learning".
| Supervised Learning |
The goal is to predict a value for a particular
characteristic, or attribute that describes some behavior. The attribute
being predicted is called the Target Attribute. |
| Unsupervised Learning |
The goal is to discover relationships and
patterns rather than to determine a particular value. Here, there is no
Target Attribute. |
Oracle Data Mining (ODM) is powerful
data mining software embedded in the Oracle Database that enables you to discover
new insights hidden in your data. Oracle Data Mining helps businesses to target
their best customers, find and prevent fraud, discover the most influential
attributes that affect Key Performance Indicators (KPIs), and find valuable
new information hidden in the data. Oracle Data Mining helps technical professionals
find patterns in their data, identify key attributes, discover new clusters
and associations, and uncover valuable insights.
Oracle Data Mining enables companies to:
| KNOW MORE |
Leverage your data and discover valuable
new information and insights that were previously hidden. |
| DO MORE |
Build applications that automate the extraction
and dissemination of new information and insights. |
| SPEND LESS |
Oracle Data Mining is significantly less
expensive compared to traditional approaches and, as a component of your
investment in Oracle technology, significantly reduces your total cost of
ownership. |
Oracle Data Mining enables you to go
beyond standard query and reporting tools and Online Analytical Processing (OLAP).
Query and reporting and OLAP tools can tell you who are your top customers,
what products have sold the most, and where you are incurring the highest costs.
With Oracle Data Mining, you can implement strategies to:
Traditional business intelligence (BI) tools such as reports,
interactive query and reporting, and Online Analytical Processing (OLAP), only
report on what has happened in the past. Oracle Data Mining (ODM) allows you
to go beyond traditional BI and reporting to mine your data and build advanced
data mining applications. ODM enables you to discover new insights, segments
and associations, make more accurate predictions, find the variables that most
influence your business, and in general, extract more information from your
data. For example, by analyzing the profiles of your best customers, ODM enables
you to build data mining models and integrated applications to identify customers
who are likely to become your best customers in the future. These customers
may not represent your most valuable customers today, but may match profiles
of your current best customers. Moreover, with ODM you can do more and transform
a predictive model into a regular production application that distributes lists
of your most promising customers to your Sales force every Monday morning. Knowing
the strategic value of your customers which are likely to
become profitable customers in the future and which are not, or predicting which
customers are likely to churn or likely to respond to a marketing offer
and integrating this information into your operations is the key to proactively
managing your business.
The phases of solving a business problem using Oracle Data
Mining are as follows:
Note: This tutorial is not
intended as a comprehensive course on Oracle Data Mining. It illustrates the techniques required to carry out some common data mining operations. If you want more background information about
some of the topics, see the Oracle Data Mining Concepts Guide.
Back to Topic List
An electronics store chain wants to distribute a discount
card to its customers, but only to those customers who are expected to increase
their buying (and thus the companys revenue) because of this card. A test
campaign was run on a sample of customers and the results were compiled into
a table containing the customer demographics, purchasing patterns, and a measure
of revenue produced by each customer.
A dataset describing the customers and results in the test
campaign is used to create a model that can be applied to all customers for
the purpose of predicting revenue levels expected from each customer who uses
a discount affinity card the Target attribute to be predicted is AFFINITY_CARD
with values 0 (= low revenue) and 1 (= high revenue). The distinct target values
(0 and 1 in this case) are sometimes called the target classes,
thus the prediction of a target value for each customer is called Classification.
Back to Topic List
Before starting this tutorial, you should:
Back to Topic
List
To launch Oracle Data Miner perform the following.
| 1. |
Double-click <your_path>\bin\odminerw.exe.
|
| 2. |
To connect as dmuser1, click OK.
|
| 3. |
You may want to confirm sample tables and views in your user schema.
Expand Data Sources > dmuser1.
This displays both tables and views. Click Tables to view
tables; Click Views to have the list of views.

|
Back to Topic List
The data used in the Data Mining process usually has to be
collected from various locations. Also, some transformation of the data is usually
required to prepare the data for Data Mining operations. Oracle Data Mining
has powerful utilities for sampling, recoding, discretizing and creating new
transformed attributes.
In this section, you perform the following tasks:
Back to Topic List
Display Data
and Statistics
Perform the following steps:
| 1. |
You can display the structure of a table or view. Click
MINING_DATA_BUILD_V.

|
| 2. |
Click the Data tab to view a sample of the contents.
Scroll to the right of the window to view full details

Note: The default number of records shown is 100; Enter a different number
in the Fetch Size window, then click Refresh to change the size of the
display. Also, you can click Fetch Next to add more rows to the display.
|
| 3. |
Right-click MINING_DATA_BUILD_V to expose a menu with more options.
You have access to Transformation wizards and other tools for data exploration.
|
| 4. |
To see a statistical summary, you can click one of the two selections
depending on the data format type. Click Show Summary Single-Record.

Your output should match the image below:

For each numerical attribute, Maximum and Minimum values, as well as
average and variance are shown. These statistics are calculated on a sample.
The sample size in this example is 999. You can change the sample size
by adjusting ODM Preferences, available on the Tools pull-down menu.
|
| 5. |
For any highlighted attribute, you can view a distribution of values.
a. Highlight AFFINITY_CARD.

b.Click Histogram.

The default maximum number of bins is 10; this number can be changed by clicking
Preferences.
Numerical attributes are divided into the designated number of bins of
equal width between the minimum and maximum. The bins are displayed in
ascending order of attribute values. Categorical attributes are binned
using the "Top N" method(N is the number of bins). The N values
occurring most frequently have bins of their own; the remaining values
are thrown into a bin labeled "Other". The bins are displayed
in descending order of bin size.
When done reviewing, click OK, then close the Data Summarization Viewer window.
|
Use Data Transformation
wizard
The Mining Activity Guides assist you in joining data
from disparate sources into one view or table. Also, they carry out transformations
that are required by a particular algorithm. However, there are transforms that typically
are completed on a standalone basis using one of the following Data Transformation
wizards.
 |
Filter |
 |
Recode |
 |
Derive field |
In this tutorial, you learn how to use the Filter Single -Record
Transformation wizard.
Suppose you want to concentrate on customers between the ages
of 21 and 35. You can filter the data to include only those people. Oracle Data
Miner provides a filtering transformation to define a subset of the data based
upon attribute values. Perform the following steps:
| 1. |
Click Data > Transform > Filter Single-Record.

|
| 2. |
Click Next on the Welcome page.
|
| 3. |
Identify the input data and click Next. In this example the view
used is MINING_DATA_BUILD_V.
Note: If you accessed the wizard by right-clicking the table or view
name, then the data is already known and this step is skipped.
|
| 4. |
Enter a name for the resultant view and click Next.

|
| 5. |
Enter the filter condition in the Filter text box. Alternatively, to
construct the condition in a dialog box click the icon to the right
of the Filter field.

|
| 6. |
In the Expression Editor page, peform the following steps:
a. Double-click the attribute name AGE.

b. Click the >= button and type 21 to construct
the first part of the condition shown.

c. Click AND to continue defining the full condition. Double
Click AGE.

d. Click the <= button and type 35.

e. Click the Validate button to check that the condition
is satisfied by a subset of the source data.

f. Click OK to display the condition in the Filter window.

|
| 7. |
Click Next.

|
| 8. |
Click Finish to complete the transformation.

Your output should match the image below:

|
Back to Topic
Back to Topic List
When the data mining problem has been defined and the
source data identified, there are two phases remaining in the Data Mining Process:
Build/Evaluate models and deploy the results. Oracle Data Miner contains Activity
Guides for the purpose of carrying out these phases with the minimum of required
intervention. Moreover, the implicit and explicit choices and settings used
in the Build activity can be passed on seamlessly to the Apply activity. Therefore, many operations usually required are hidden or eliminated.
The Build Activity Wizard allows you to :
 |
Identify supplemental
data to add to the case table(the basic source data) |
 |
Select the Data Mining Functionality
and Algorithm |
 |
Adjust the Activity settings
manually, rather than accept automatic settings |
This example uses the Decision Tree Classification algorithm to create, evaluate, and apply a predictive model classifying customers as fitting the profile of either a high-revenue or low-revenue customer. You need to perform the following tasks:
Back to Topic List
Build a Predictive Model using a Mining Activity
| 1. |
For this topic, you use the original view, not the view you just created. Click MINING_DATA_BUILD_V in the left navigator.

|
| 2. |
To launch the Activity Build Wizard, click Activity
> Build.

|
| 3. |
The New Activity Wizard-Welcome page is displayed.
Click Next.
|
| 4. |
On the Select Mining Type page, accept the default of Classification Function Type and the Decision Tree Algorithm. Click Next.
|
| 5. |
To select all the data to be mined from the case table or view, ensure that the box Join additional data with case table is not checked. If the box is checked, you can specify data from other tables and schemas to be joined with the case table. This example assumes that all the necessary data is in the case table MINING_DATA_BUILD_V. From the Unique Identifier drop down list, select CUST_ID. Then click Next.

|
| 6. |
You can specify the attribute to be predicted for a data mining problem. Click the AFFINITY_CARD radio button to indicate the target and then click Next.
Note: The Target Attribute to be predicted in this tutorial is AFFINITY_CARD
|
| 7. |
Some evaluation results depend upon your choice of Preferred Target Value. In this example, you want to identify the high-revenue customers (Target Value = 1). Select 1 for the Preferred Target Value and click Next.

|
| 8. |
Enter a name for the activity that is descriptive and click Next.
|
| 9. |
The last step of the Activity Wizard is displayed. You can display the settings in the Activity. Click Advanced Settings.

|
| 10. |
The parameters have default settings that ensure a good result. The Build parameters determine how the branches of the decision tree are constructed. Click OK.

|
| 11. |
Ensure that the Run upon finish check box is selected. Click Finish.
When the Activity Wizard is completed, the steps appropriate to the chosen activity are displayed. If you choose Run Upon Finish, the steps are executed to completion in sequence and a check appears on the right side of each step as it is completed.
|
| 12. |
To view the structure of the tree, click Result in the Build step.
|
| 13. |
Your results might differ from the display due to the small size of the sample dataset. The default view shows all nodes and the attribute values used to determine splits. You can highlight a node to show the rule for a record to be included in that node. The predicted value is the target value of the majority of records in that node. Confidence is the percentage of records in the node having the predicted taget value. Cases is the actual number of cases in the source data satisfying the rule for that node. Support is the percentage of cases in the source data satisfying the rule for that node.
|
| 14. |
You may want to view only the terminal nodes(known as Leaves) and eliminate the intermediate nodes. Click the Show Leaves Only check box.
|
| 15. |
A decision tree is sensitive to missing values when applied to new data. However, if the splitting attribute is missing, the ODM Decision Tree Algorithm provides an alternative attribute known as surrogate to be used in its place. If both the splitting attribute and its surrogate are missing, the predicted value is determined at the parent node of the tree. To view the surrogate, highlight a node and click the Surrogate radio button.
Close the Build Results window.
|
| 16. |
To evaluate the Decision Tree Model, click Result in the Test Metrics step.
|
| 17. |
Click the Accuracy tab.
|
| 18. |
Click the More Detail button to see the Confusion Matrix.
|
| 19. |
The Confusion Matrix is calculated by applying the model to the hold-out sample from the test campaign. The values of AFFINITY_CARD are known and are represented by the rows; the columns are the predictions made by the Classification Model. The upper right and lower left cells indicate potential prediction errors. Most business problem solutions attempt to minimize one type of error. For example, the lower left cell contains the number of “missed opportunities” – cases of high-revenue customers who were not correctly identified by the model. Click the ROC tab.
|
| 20. |
To explore possible changes in the model's parameters, you can slide the red line back and forth by clicking the arrow buttons at the bottom right of the graph and observe the change in the Confusion Matrix until the values indicate a good solution to the problem. Click Lift.
|
| 21. |
You can view two graphs showing different interpretations of the lift calculations. With the Cumulative Lift radio button selected, this indicates how much better the model does than the naive guess. In this example, the Decision Tree finds about 2.5 times more positive cases than chance in the first quantile. Select the Cumulative Positive Cases radio button.
|
| 21. |
The graph indicates how many cases must be analysed in order to find the desired number of positive cases, that is high-revenue customers.
|
Back to Topic
Apply a Predictive Model using a Mining Activity
When a model is applied to new data, the input data must be
prepared and transformed in exactly the same way that the source data for the
Build activity was prepared. The Apply activity is based on a Build activity.
This Build activity passes to the Apply activity whatever knowledge is required
to prepare the input data appropriately. Perform the following steps:
| 1. |
Click Activity > Apply

|
| 2. |
Click Next.

|
| 3. |
Expand Classification and select the Build Activity that you just built. Then click Next.
|
| 4. |
Click the Select link to choose the input data for the Apply procedure.
|
| 5. |
Expand DMUSER1 and select MINING_DATA_APPLY_V. Then click OK.
|
| 6. |
Accept the default data source. Click Next.
|
| 7. |
A table will be produced containing the Prediction and Probability for each customer. You can select additional attributes for display in the results. Accept the defaults and click Next.
|
| 8. |
You have a choice of formats for the Predictions. Accept the default, which includes the Target value with the highest Probability for each case, then click Next.
|
| 9. |
Enter a Name for the activity and click Next.
|
| 10. |
Ensure that the Run upon finish check box is selected. Click Finish.
|
| 11. |
Your Data Mining activity is running. When completed, click the Result link.
|
| 12. |
For each customer (case), the prediction of either 1 (high-revenue) or 0 (low-revenue) is included, along with the Confidence in that Prediction (Probability). Cost is another measure of the value of the prediction – low Cost means high Probability. Since this model is a Tree, where each branch represents a rule, you can highlight a row and click Rule to see how the prediction was made.
|
| 13. |
The rule logic is explained.
|
Back to Topic
In this tutorial, you learned how to:
 |
Launch Oracle Data Miner |
 |
Prepare Data for Data Mining |
 |
Use the Mining Activity Guides |
Back to Topic List
Place
the cursor on this icon to hide all screenshots.
|