Solving Business Problems with Oracle Data Mining
Solving Business Problems with Oracle Data Mining
This tutorial shows you how to use Oracle Data Mining to solve business problems.
Approximately 1 hour
This tutorial covers the following topics:
Place the cursor over this icon to load and view all the screenshots for this tutorial. (Caution: Because this action loads all screenshots simultaneously, response time may be slow depending on your Internet connection.)
Note: Alternatively, you can place the cursor over each individual icon in the following steps to load and view only the screenshot associated with that step.
Oracle Data Mining (ODM) is powerful data mining software embedded in the Oracle Database that enables you to discover new insights hidden in your data. Oracle Data Mining helps businesses to target their best customers, find and prevent fraud, discover the most influential attributes that affect Key Performance Indicators (KPIs), and find valuable new information hidden in the data. Oracle Data Mining helps technical professionals find patterns in their data, identify key attributes, discover new clusters and associations, and uncover valuable insights.
Oracle Data Mining enables companies to:
| KNOW MORE |
Leverage your data and discover valuable new information and insights that were previously hidden. |
| DO MORE |
Build applications that automate the extraction and dissemination of new information and insights. |
| SPEND LESS |
Oracle Data Mining is significantly less expensive compared to traditional approaches and, as a component of your investment in Oracle technology, significantly reduces your total cost of ownership. |
Oracle Data Mining enables you to go beyond standard query and reporting tools and Online Analytical Processing (OLAP). Query and reporting and OLAP tools can tell you who are your top customers, what products have sold the most, and where you are incurring the highest costs. With Oracle Data Mining, you can implement strategies to:
 |
Anticipate and prevent customer attrition |
 |
Acquire new customers and identify the most profitable customers |
 |
Identify promising cross-sell opportunities |
 |
Detect noncompliant and fraudulent activities |
 |
Discover new clusters or segments |
 |
Develop customer profiles |
 |
Identify likely targets and promising leads in drug discovery |
 |
Find association relationships of co-occurring items and/or events |
 |
Mine unstructured data, that is, text. |
Traditional business intelligence (BI) tools such as reports, interactive query and reporting, and Online Analytical Processing (OLAP), only report on what has happened in the past. Oracle Data Mining (ODM) allows you to go beyond traditional BI and reporting to mine your data and build advanced data mining applications. ODM enables you to discover new insights, segments and associations, make more accurate predictions, find the variables that most influence your business, and in general, extract more information from your data. For example, by analyzing the profiles of your best customers, ODM enables you to build data mining models and integrated applications to identify customers who are likely to become your best customers in the future. These customers may not represent your most valuable customers today, but may match profiles of your current best customers. Moreover, with ODM you can do more and transform a predictive model into a regular production application that distributes lists of your most promising customers to your Sales force every Monday morning. Knowing the “strategic value” of your customers — which are likely to become profitable customers in the future and which are not, or predicting which customers are likely to churn or likely to respond to a marketing offer — and integrating this information into your operations is the key to proactively managing your business.
Back to Topic List
An electronics store chain wants to distribute a discount card to its customers, but only to those customers who are expected to increase their buying (and thus the company’s revenue) because of this card. A test campaign was run on a sample of customers and the results were compiled into a table containing the customer demographics, purchasing patterns, and a measure of revenue produced by each customer.
A dataset describing the customers and results in the test campaign is used to create a model that can be applied to all customers for the purpose of predicting revenue levels expected from each customer who uses a discount affinity card – the Target attribute to be predicted is AFFINITY_CARD with values 0 (= low revenue) and 1 (= high revenue). The distinct target values (0 and 1 in this case) are sometimes called the target “classes”, thus the prediction of a target value for each customer is called Classification.
Back to Topic List
Before starting this tutorial, you should:
| 1. |
Install Oracle Database 10.1.0.2.
|
| 2. |
Install Oracle Database Companion CD 10.1.0.2.
|
| 3. |
Download the Oracle Database 10.1.0.3 patchset from metalink and unzip. Install the product using OUI. Then run catpatch.sql and then utlrp.sql
|
| 4. |
Download the Oracle Data Mining patch 10.1.0.3.1 and unzip. Run backup.bat then dm10131_patch.bat.
|
| 5. |
Create a new folder and extract the Oracle Data Miner zipfile into the new folder.
|
| 6. |
Delete <odminer>\lib\odmapi.jar if it exists, then copy <ORACLE_HOME>\dm\lib\odmapi.jar into <odminer>\lib. <odmapi> is the folder into which you unzipped Oracle Data Miner.
|
| 7. |
Follow the instructions listed in readme. html for the Oracle Data Miner product in Appendix A: ODM Installation Notes to create a tablespace and user and load the schema.
|
Back to Topic List
To launch Oracle Data Miner, perform the following:
| 1. |
From the <odminer>\bin folder, double-click on odminerw.exe. This will launch the Oracle Data Miner user interface.
|
| 2. |
The first time you log in you need to create a new connection. Enter the following information and click OK.
Connection Name: orcl
User: odm
Password: odm
Host: <the machine where your Oracle Database is installed>
Port: 1521
SID: orcl
Note: the above values depend on your installation.

|
| 3. |
Once your connection has been created, click OK to open Data Miner.

|
Back to Topic List
Before you start creating a model to manipulate the data, you want to review the data. Perform the following:
| 1. |
Connections to the schemas required by ODM have been created. Expand Data Sources > ODM > Tables. Then scroll down and select MINING_DATA_BUILD to see a description of the data.

|
| 2. |
The table contains demographic and purchasing information of a subset of customers who took part in the test marketing campaign. Click the Data tab to see a sample of the data.

|
| 3. |
Scroll to the right to see the target attribute in this tutorial, which is the AFFINITY_CARD column.

|
| 4. |
The value is 1 for any customer who increased spending more than 10% in the campaign, and 0 otherwise.

|
Back to Topic List
The Oracle Data Miner user interface provides wizards to lead the user through the steps of each phase of the data mining process. In addition, the Mining Activity Guide assists the user in establishing settings to be used by several wizards, and in launching the wizards in the correct order.
You will use the Mining Activity Guide to build an Attribute Importance model.
A table to be mined may have attributes that do not contribute at all to building a good model; in fact, some attributes may actually reduce the model’s predictive power. The Attribute Importance function of Oracle Data Mining builds a list of the attributes ranked by their influence in predicting the value of the target attribute.
The steps involved include:
Back to Topic List
| 1. |
Select File > New Mining Activity > Build.

|
| 2. |
The Build Activity Welcome window appears. Click Next.

|
| 3. |
Change the name to MAG_AI_ACTIVITY1 and click Next.

|
| 4. |
Make sure that the Schema is set to your DM user, in this case ODM. Select MINING_DATA_BUILD for Table/View. Ensure Single record per case is clicked. Then click Next.

|
| 5. |
Make sure the Attribute Importance Mining Model Type is selected and click Next.

|
| 6. |
You have established the settings for running the wizards associated with building an Attribute Importance Model. Click Finish to launch the Mining Activity Guide.

|
| 7. |
The Mining Activity Window appears.

|
Back to Topic
The Mining Activity leads you through a sequence of operations in the correct order; each step launches a wizard to carry out the operation. The choices made previously while defining the Activity will be embedded into each of the wizards.
Not every step shown in the Activity Guide is required; for example, the Attribute Importance algorithm will carry out a default discretization (binning) scheme unless you want to define customized bins.
Normally, if the data has a large number of cases a sample is extracted for mining purposes. The input table for this tutorial has only 1500 cases, but you will define a sample to illustrate the technique.
| 1. |
In the Sample area of the Activity window, click Start.

|
| 2. |
There are two types of sampling – in certain situations the types of cases to be included must be specified using Stratified Sampling. In this tutorial, however, select the Random sampling type and click OK.

|
| 3. |
At the Sample Transformation Welcome window, click Next.

|
| 4. |
You identified the input table and format of the data when defining the Activity. Click Next.

|
| 5. |
You want the resultant sample to be a table. Accept the defaults and click Next.

|
| 6. |
You can define the sample as a percentage of the original number of rows or as a fixed number of rows. Click Sample Size and change to a value of 1000. Then click Next.

|
| 7. |
Click Preview Transform to see a few rows of the resultant sample.

|
| 8. |
Review the sample data. Click Advanced SQL to see the code that will generate the results.

|
| 9. |
Review the SQL. Click OK 2 times to return to the final wizard page.

|
| 10. |
Click Finish to return to the Activity Guide.

|
| 11. |
Note that the Sample operation is now flagged as Completed.

Discretization is the next step in the Mining Activity. Discretization is the operation of creating ranges of values (bins) for a given attribute. In this tutorial, you will let the algorithm select binning schemes automatically so you will not run this wizard.
|
Back to Topic
| 1. |
In the Build area of the Activity window, click Start.

|
| 2. |
At the Attribute Importance Model Build Welcome window, click Next.

|
| 3. |
Change the name to MAG_AI_BUILD1 and click Next.

|
| 4. |
You identified the input table and format of the data when defining the Activity. Click Next.

|
| 5. |
Data Preparation means binning; since you skipped the Discretization wizard, select Automatic Preparation and click Next.

|
| 6. |
You will be building a Classification model to determine your most loyal and profitable customers by predicting the value of the attribute AFFINITY_CARD. Highlight AFFINITY_CARD to indicate the target attribute and click Next.

|
| 7. |
In order to produce results that are as general as possible, unique identifiers should be removed. Scroll down to check if ID has been removed.

|
| 8. |
The wizard calculates that ID is an identifier and excludes it automatically. Click Next.

|
| 9. |
Click Finish to return to the Activity Guide.

|
| 10. |
Note that the Discretization step is flagged as Skipped. Since the steps of the activity are complete, the resultant code is packaged into a stored procedure and sent to the database for execution. The model name appears in the Active Tasks window together with its status. Right-click on the highlighted status and select View to initiate a Task window.

|
| 11. |
Click Refresh to update the display. When the execution is successful, you see the elapsed time and input and output names. Click the Model: MAG_AI_BUILD1 to see the ranked list of attributes.

|
| 12. |
The Importance Values have no absolute meanings except that positive values indicate influence in predicting the target value while zero or negative values indicate attributes that detract from the model. There is relative meaning; for example, you can deduce that HOUSEHOLD_SIZE is more than three times more important than AGE. This list can be used to select attributes used in the Classification model build.

You can save the ranked list to a text file or to an Excel file by clicking the Save icon at the top right of the list.
|
Back to Topic
Oracle Data Mining provides three different Classification algorithms: Naive Bayes, Adaptive Bayes Networks, and Support Vector Machines. In this tutorial, you will create a Classification Model using the Naives Bayes Method. Perform the following steps:
| 1. |
Select Model > Classification > Build.

|
| 2. |
At the Classification Model Build Welcome window, click Next.

|
| 3. |
Enter the name ODM_NB_BUILD1 and click Next.

|
| 4. |
Make sure that the Schema is set to your DM user, in this case ODM. Select MINING_DATA_BUILD for Table/View. Ensure Single record per case is clicked. Then click Next.

|
| 5. |
Naïve Bayes looks at the historical data and calculates conditional probabilities for the target values by observing the frequency of attribute values and of combinations of attribute values. For example, suppose A represents “the customer is married” and B represents “the customer increases spending”.
The Bayes theorem states that
Prob(B given A) = Prob(A and B)/Prob(A)
So, to calculate the probability that a customer who is married will increase spending, the algorithm must count the number of cases where A and B occur together as a percentage of all cases (“pairwise” occurrences), and divide that by the number of cases where A occurs as a percentage of all cases (“singleton” occurrences).If these percentages are very small, they probably won’t contribute to the effectiveness of the model, so for the sake of speed and accuracy, any occurrences below a certain Threshold are ignored. To set the thresholds, click Algorithm Settings.

|
| 6. |
Change the Singleton and Pairwise Threshold values to 0.1 and click OK.

|
| 7. |
Click Next.

|
| 8. |
You want to let the algorithm bin the data automatically. Select Automatic Preparation and click Next.

|
| 9. |
You must identify the target attribute which indicates what the model will predict. Highlight AFFINITY_CARD and click Next.

|
| 10. |
You can assign weights to the target values in two different ways so that the model is forced to predict more of one particular value. In this tutorial, weighting is not used. Make sure No is selected in both cases and click Next.

|
| 11. |
In order to build a model that can be applied to the general population, you need to eliminate any identifiers from the data. Scroll down to check if ID has been removed.

|
| 12. |
The wizard calculates that ID is an identifier and excludes it automatically. Click Next.

|
| 13. |
Click Finish to create the model.

|
| 14. |
The wizard uses all the information entered to create a Java program that builds a Naive Bayes Classification model. The package is sent to the server automatically. Right-click on the highlighted status and select View to initiate a Task window.

|
| 15. |
The Task Viewer opens to provide a view of the database objects created by ODM. You can watch the progress by clicking on the Refresh button. Note that some steps may occur too fast to be listed. When the Build task completes successfully, the Naive Bayes Model is stored in the database.

|
Back to Topic List
Now the model can be tested against the data that is distinct from the Build data. Perform the following steps:
| 1. |
Select Model > Classification > Test .

|
| 2. |
At the Classification Model Test Welcome window, click Next.

|
| 3. |
Enter the name ODM_NB_TEST1 and click Next.

|
| 4. |
Select the model ODM_NB_BUILD1 that you want to create a test result for. Then click Next.

|
| 5. |
Select Standard Test and click Next.

|
| 6. |
You need to identify the data to be used for the test. It must be distinct from the data used to build the model. Make sure that the Schema is set to your DM user, in this case ODM. Select MINING_DATA_TEST for Table/View. Ensure Single record per case is clicked. Then click Next.

|
| 7. |
Click Finish to run the test.

|
| 8. |
Right-click on the highlighted status and select View to initiate a Task window.

|
| 9. |
When the task has completed successfully, click on the Result: ODM_NB_TEST1 to see the Accuracy and Confusion Matrix.

|
| 10. |
The accuracy of this model when applied to the test data is about 77%. (You should see similar – maybe not exactly the same - results).
The matrix indicates the types of errors that the model is likely to make; the columns are predictions and the rows are actual values; for example, the number 344 in the lower left cell indicates the false-negative predictions – predictions of 0 when the actual value is 1.
Note that this model produces many more predictions of 0 than of 1, and there is a high number of false negative errors. In an actual development environment, more models would be created with various parameter choices, different binning scheme, or larger input build dataset, and tested until a model is found that minimizes the errors.

|
Back to Topic List
Lift is a different type of model test. It is a measure of how “fast” the model finds the actual positive target values. The questions this test provides includes “How much of my Customer database must I solicit to find 50% of the customers likely to buy Product X?”. ODM applies the model to test data to gather predicted and actual target values (the same data that was used in the prior Test), sorts the predicted results by Confidence in a positive prediction, divides the ranked list into 10 equal parts (quantiles), and then counts the Actual positive values in each quantile. Confidence is a probability calculated as part of the prediction process. Perform the following steps:
| 1. |
Select Model > Classification > Lift .

|
| 2. |
At the Classification Model Lift Welcome window, click Next.

|
| 3. |
Enter the name ODM_NB_LIFT1 and click Next.

|
| 4. |
Select the model ODM_NB_BUILD1 that you want to create a test result for. Then click Next.

|
| 5. |
You need to identify the data to be used for the test. It must be distinct from the data used to build the model. Make sure that the Schema is set to your DM user, in this case ODM. Select MINING_DATA_TEST for Table/View. Ensure Single record per case is clicked. Then click Next.

|
| 6. |
You must identify the target value that is considered "positive". In this case, you want to identify the customers who will increase their spending, so select 1 as the Target Value. Then click Next.

|
| 7. |
Click Finish to run the test.

|
| 8. |
Right-click on the highlighted status and select View to initiate a Task window.

|
| 9. |
When the task has completed successfully, click on the Result: ODM_NB_LIFT1 to see the results.

|
| 10. |
Two different results are shown in both tabular and graphical format. Cumulative Lift indicates how much better the model is at finding positive values than random selection (for the data and model in this tutorial, you should get about 2.7 in the top quantile, that is, the model should find about 3 times as many actual positive values than would be found in a random selection of 10% of the data). To view the other result, click the Cumulative Target radio button.

|
| 11. |
Cumulative Target is a display showing by percentage the number of positive values - you should see about 27% in the top quantile, about 47% in the top two, about 60% in the top three. The interpretation is that if this model is used to score customers, over 60% of high-revenue customers will be found by looking at only the top 30% of the data.

|
Back to Topic List
Once the model that best fits the problem is determined, it is ready to be deployed and applied to new data to produce business intelligence that will improve the enterprise – an on-going mechanism to identify the high-revenue customers. In this tutorial, you will use some of the historic data to simulate scoring data for which the target value is not known and must be supplied. Perform the following steps:
| 1. |
Select Model > Classification > Apply .

|
| 2. |
At the Classification Model Apply Welcome window, click Next.

|
| 3. |
Enter the name ODM_NB_APPLY1 and click Next.

|
| 4. |
Select the model ODM_NB_BUILD1 that you want to be applied to the data. Then click Next.

|
| 5. |
You need to identify the data to be used for the apply operation. Make sure that the Schema is set to your DM user, in this case ODM. Select MINING_DATA_APPLY for Table/View. Ensure Single record per case is clicked. Then click Next.

|
| 6. |
Each customer in the scored result list must be identified by some attribute value. Click the checkbox next to ID and click Next.

The result of the Apply operation is a table containing a prediction and probability for each customer.
|
| 7. |
There are several different formats available for the output table. In this case, you will use a unique predicted value for each customer. Select Values of highest target values (most likely), enter 1 for the Number of Top Target Values and click Next.

In each case, ODM calculates a probability for each distinct target value; these values always sum to 1.00.
In the current example, there are 2 target values, 0 and 1; if Prob(0) = .7, then Prob(1) = .3. By setting Number of Top Target Values = 1, the outcome with the higher probability is displayed. If the target has multiple values, say the 7 possible types of computer that the customer might buy, you can see the 3 most likely by setting Number of Top Target Values = 3.
By clicking on the radio button Value of Specific Target Values, and entering 1 in the window, you will see the probability of a prediction of 1 each case, no matter how low the probability.
|
| 8. |
In the output table, you will see the prediction and the confidence in that prediction (that is, the probability that the prediction is correct) for each individual. Enter the table ODM_NB_APPLY1 and click Next.

|
| 9. |
Click Finish to apply the model.

|
| 10. |
Right-click on the highlighted status and select View to initiate a Task window.

|
| 11. |
When the task has completed successfully, click on the Result: ODM_NB_APPLY1 to see the results.

|
| 12. |
In the output table, you see the prediction and the confidence in that prediction (that is, the probability that the prediction is correct) for each individual. Thus you have a scored list containing the IDs of customers who will most likely to increase spending if given an Affinity Card.
Refer to Step 10 of Testing the model to recall that this model predicts a value of 0 in almost every case. In a production environment, more models would be built with various parameter settings until a model is chosen that would predict values that are more reasonable than those shown in this result.

|
Back to Topic List
Support Vector Machines (SVM) can solve both Classification and Regression problems. In this case, you will use Classification, that is, the prediction of a discrete target value. In particular, you want to build a model that can be applied to the entire customer base to determine which class – high spender or not – a given customer belongs to.
The SVM Algorithm can use unstructured text as an attribute, along with structured attributes, to build a predictive model. A text column can be of type VARCHAR2 with size <= 4000, or CLOB.
The steps involved include:
Back to Topic List
Building and testing a Classification Model Without Text
The sample tables supplied with ODM include two containing a text column for the purpose of illustrating text mining with SVM. They are similar to the build and test tables used in the previous topics, except that comments by the customers are included. Before an unstructured (text) column can be used in building a predictive model, it must be indexed; the tables that you use in this tutorial have already been indexed, so you can proceed directly to the Model Build.
To show the effect of including free-form text in the modeling operations, you will build and test two models on the same data. In this case, you will exclude the text. Perform the following steps:
| 1. |
Select Model > Classification > Build.

|
| 2. |
At the Classification Model Build Welcome window, click Next.

|
| 3. |
Enter the name ODM_SVM_NOTEXT_BUILD1 and click Next.

|
| 4. |
Make sure that the Schema is set to your DM user, in this case ODM. Select MINING_BUILD_TEXT for Table/View. Ensure Single record per case is clicked. Then click Next.

|
| 5. |
Select the Support Vector Machine Algorithm and click Next.

|
| 6. |
You want to let the algorithm bin the data automatically. Select Automatic Preparation and click Next.

|
| 7. |
You must identify the target attribute which indicates what the model will predict. Highlight AFFINITY_CARD and click Next.

|
| 8. |
You can assign weights to the target values so that the model is forced to predict more of one particular value. In this tutorial, weighting is not used. Make sure No is selected and click Next.

|
| 9. |
Exclude the COMMENTS column and click Next.

|
| 10. |
Click Finish to create the model.

|
| 11. |
Right-click on the highlighted status and select View to initiate a Task window. Make sure the build completes successfully.
|
| 12. |
Now you are ready to test the model. select Model > Classif | |