Legal | Privacy
Using Oracle Data Mining to Predict Data Behavior
 
 

Using Oracle Data Mining to Predict Data Behavior

Module Objectives

Purpose

This module focuses on how to use the Data Mining capability of the Oracle9i Database.

Objectives

After completing this module, you should be able to:

Use the Naive Bayes model to predict the value of a designated variable

Use association rules to predict relationships

Prerequisites

Before starting this module, you should have completed the following:

Preinstallation Tasks
Installing the Oracle9i Database
Postinstallation Tasks

Reference Material

The following is useful reference material if you want additional information about the topics in this module:

Documentation: Oracle9i Data Mining Concepts

 

Overview

Oracle9i Data Mining (ODM) provides the means to find previously hidden patterns and relationships in data to enable better decision-making; ODM uses past data to predict the future. For example, you can develop applications using data mining to gain customer insight. By mining data that has been consolidated into one unified view, you can build detailed profiles of your best customers and then find other customers who fit this profile. With ODM, you can gain greater insight into customers’ needs and then provide them with the right products and services.

With ODM, companies can tap information hidden in their corporate databases to reveal new insights about their customers and their businesses. ODM can also detect hidden patterns in life science, government, manufacturing, and other applications, such as:
• Finding “target” genes and promising “leads” to attack diseases
• Predicting the quality of a manufactured part
• Finding associations between patients, drugs, and outcomes
• Identifying possible network intrusions.

Data Mining Techniques

Data mining can be separated into supervised and unsupervised learning techniques. Supervised learning requires the data analyst to identify a dependent variable. This type of data mining technique builds models that predict the probability of specific outcomes. It can be used to answer questions such as “who is likely to buy a mutual fund in the next 6 months and why?” ODM provides the Naïve Bayes (NB) and Adaptive Bayes Network (ABN) data mining algorithms for supervised learning types of problems. Attribute Importance (for ranking the predictive power of attributes) and ABN are new in ODM Release 2.

In unsupervised learning, the user does not indicate the objective to the data mining algorithm. This type of data mining technique can be used to identify co-occurring items or events. It can be used to answer questions such as “what are popular product bundles?” ODM provides Association Rules and two distinct clustering algorithms for unsupervised learning types of problems. The Clustering unsupervised learning function is new in ODM Release 2.

This lesson illustrates two of these algorithms, Naive Bayes and Association Rules.

Data Mining in the Database

ODM simplifies the process of extracting business intelligence from large amounts of data. It eliminates off-loading vast quantities of data to external special-purpose analytic servers for data mining and scoring. With ODM, all the data mining functionality is embedded in the Oracle9i Database, so the data, data preparation, model building, and model scoring activities remain in the database.

With all phases of data mining performed within the database, each data mining phase results in significant improvements in productivity, automation, and integration. These improvements are notable in data preparation, which often can constitute as much as 80% of the data mining process.

Data Mining for Java Wizard in JDeveloper

Oracle will be introducing a new set of wizards in JDeveloper, available from OTN in January, that will automatically build the Java components required to perform each of the tasks in a data mining project. The Java code will be constructed from the source data and parameters specified by the user in the JDeveloper User Interface.

To see a glimpse, click the show me button below.

Set Up Data Mining Environment

You will need to perform the following tasks to set up the data mining environment:

1.

Unlock the data mining accounts

2. Start the Oracle Data Mining Task Monitor
3. Edit the Database connection information in Sample_Global.property
4. Verify JAVA 1.3.1 environment

Unlock the Data Mining Accounts

1.

From a SQL*Plus session logged on as SYS, enter the following:

alter user odm account unlock;
alter user odm_mtr account unlock;

 

Start the Oracle Data Mining Task Monitor

1.

From a SQL*Plus session, execute the following:

connect odm/odm@orcl.world
exec odm_start_monitor

 

Edit the Database Connection Informatioin in Sample_Global.property

1.

Open the Sample_Global.property file in wordpad. This file is located in <ORACLE_HOME>\dm\demo\sample

 

2.

Modify the following lines with your host and username information:

miningServer.url=jdbc:oracle:thin:@<hostname>:1521:orcl
miningServer.username=odm 
miningServer.password=odm

Note: leave inputDataSchemaName and outputDataSchemaName to odm_mtr

When done, save changes and close the file.

 

Verify JAVA 1.3.1 Environment

1.

Open a DOS window and execute the following commands:

java -version

 

 

Using Naive Bayes Model to Predict Targets

Naive Bayes is a supervised model used to predict the value of a designated variable, called a target (in this case the MYPREDICTION variable which is an indication of the income level of an individual), together with the confidence associated with each prediction. Perform the following steps:

1.

Compile and Execute Sample_NaiveBayesBuild

2. Compile and Execute Sample_NaiveBayesLiftandTest
3. Compile and Execute Sample_NaiveBayesApply

Compile and Execute Sample_NaiveBayesBuild

The first step is to compile and execute the application. Perform the following:

1.

Open a DOS window and change directory to <ORACLE_HOME>\dm\demo\sample and execute the following:

set ORACLE_HOME=d:\oracle\ora92
set JAVA_HOME=d:\oracle\ora92\jdk
compileSampleCode Sample_NaiveBayesBuild.java
executeSampleCode Sample_NaiveBayesBuild

Note: The above names are case sensitive.

 
2.

Review the output as follows:

***********************************************
ODM 9.2.0 Sample Program: SampleNaiveBayesBuild
***********************************************
Property Files:
    Global: Sample_Global.property
    Sample: Sample_NaiveBayesBuild.property
***********************************************
Start: Tue Aug 27 16:34:37 EDT 2002
Initialization Phase:
    Data Mining Server:
        JDBC URL: jdbc:oracle:thin:@meyoung-lap:1521:orcl
        Username: odm
Login Phase:
    Completed MiningServer login
Cleanup Phase:
    Cleanup flag: cleanupBeforeExecution
    Removed function settings object: Sample_NB_MFS
    Removed mining model object: Sample_NB_Model
    Removed mining task: Sample_NB_Build_Task
Data Setup Creation Phase:
    Created NonTransactional PDS
    Input schema: odm_mtr
    Input table: CENSUS_2D_BUILD_UNBINNED
Mining Settings Creation Phase:
    Added supplemental attribute(s)
    Persisted MiningFunctionSettings
    Name: Sample_NB_MFS
Model Build Task Phase:
    Invoking NaiveBayes Model build.
    Built NaiveBayes Model
    Name: Sample_NB_Model
    Status: SUCCESS
    Duration: 13 seconds
Logout Phase:
    Completed MiningServer logout
End: Tue Aug 27 16:35:13 EDT 2002
***********************************************

The input table is sample census data which is unbinned (not categorized). Oracle Data Mining will perform automatic binning. In addition, the program has created a model that can use demographic information to predict the income level of an individual (0=low, 1=High).

 

3.

You can optionally check the progress of the build by querying the odm_mining-task_state table. Open a SQL*Plus session and execute the following:

connect odm/odm@orcl.world 
column id format 999
column entry_timestamp format a30
column state format a15
select id, entry_timestamp, state from odm_mining_task_state;

 

Compile and Execute Sample_NaiveBayesLiftandTest

To measure the effectiveness of the model created in the Build, the Lift and Test program applies the model to another subset of the census data; the predicted values can then be compared to the actual Income Level values of each case in the input data. Perform the following:

1.

From your DOS window execute the following:

compileSampleCode Sample_NaiveBayesLiftAndTest.java
executeSampleCode Sample_NaiveBayesLiftAndTest

Note: The above names are case sensitive. Also, if you exited the DOS window you were previously using, you will have to reset the ORACLE_HOME and JAVA_HOME.

2.

Review the output as follows:

******************************************************
ODM 9.2.0 Sample Program: Sample_NaiveBayesLiftAndTest
******************************************************
Property Files:
    Global: Sample_Global.property
    Sample: Sample_NaiveBayesLiftAndTest.property
******************************************************
Start: Tue Aug 27 16:24:29 EDT 2002
Initialization Phase:
    Data Mining Server:
        JDBC URL: jdbc:oracle:thin:@meyoung-lap:1521:orcl
        Username: odm
Login Phase:
    Completed MiningServer login
Cleanup Phase:
    Cleanup flag: cleanupBeforeExecution
Data Setup Creation Phase:
    Created NonTransactional PDS
    Input schema: odm_mtr
    Input table: CENSUS_2D_TEST_UNBINNED
Model Test Task Phase:
    Invoking NaiveBayes Model test.
    Status: SUCCESS
    Result: Sample_NB_TEST
    Duration: 7 seconds
Display Result Phase:
Test Result Id: 1.0
Accuracy: 0.8033573
Confusion Matrix:
    Row Index: 0 Column Index: 0 Value: 546.0
    Row Index: 0 Column Index: 1 Value: 83.0
    Row Index: 1 Column Index: 0 Value: 81.0
    Row Index: 1 Column Index: 1 Value: 124.0
Model Compute Lift Task Phase:
    Invoking NaiveBayes Model computeLift
    Status: SUCCESS
    Result: Sample_NB_LIFT_RESULT
    Duration: 4 seconds
Display Lift Result Phase:
    LiftResultElement[0]:
        Lift=2.9543555
        LiftCumulative=2.9543555
        NumberOfNonTargetsCumulative=23
        NumberOfTargetsCumulative=61
        PercentageOfRecordsCumulative=0.10071942
        TargetDensity=0.72619045
        TargetDensityCumulative=0.72619045
    LiftResultElement[1]:
        Lift=1.8888502
        LiftCumulative=2.4216027
        NumberOfNonTargetsCumulative=68
        NumberOfTargetsCumulative=100
        PercentageOfRecordsCumulative=0.20143884
        TargetDensity=0.4642857
        TargetDensityCumulative=0.5952381
    LiftResultElement[2]:
        Lift=1.7629268
        LiftCumulative=2.2020442
        NumberOfNonTargetsCumulative=115
        NumberOfTargetsCumulative=136
        PercentageOfRecordsCumulative=0.30215827
        TargetDensity=0.43333334
        TargetDensityCumulative=0.54126984
    LiftResultElement[3]:
        Lift=1.0461324
        LiftCumulative=1.9130661
        NumberOfNonTargetsCumulative=178
        NumberOfTargetsCumulative=158
        PercentageOfRecordsCumulative=0.4028777
        TargetDensity=0.25714287
        TargetDensityCumulative=0.4702381
    LiftResultElement[4]:
        Lift=0.7842492
        LiftCumulative=1.689458
        NumberOfNonTargetsCumulative=245
        NumberOfTargetsCumulative=174
        PercentageOfRecordsCumulative=0.5023981
        TargetDensity=0.19277108
        TargetDensityCumulative=0.41527447
    LiftResultElement[5]:
        Lift=0.7842492
        LiftCumulative=1.5397921
        NumberOfNonTargetsCumulative=312
        NumberOfTargetsCumulative=190
        PercentageOfRecordsCumulative=0.60191846
        TargetDensity=0.19277108
        TargetDensityCumulative=0.37848607
    LiftResultElement[6]:
        Lift=0.09803115
        LiftCumulative=1.3352345
        NumberOfNonTargetsCumulative=393
        NumberOfTargetsCumulative=192
        PercentageOfRecordsCumulative=0.70143884
        TargetDensity=0.024096385
        TargetDensityCumulative=0.32820514
    LiftResultElement[7]:
        Lift=0.24507788
        LiftCumulative=1.199781
        NumberOfNonTargetsCumulative=471
        NumberOfTargetsCumulative=197
        PercentageOfRecordsCumulative=0.8009592
        TargetDensity=0.060240965
        TargetDensityCumulative=0.2949102
    LiftResultElement[8]:
        Lift=0.24507788
        LiftCumulative=1.0942678
        NumberOfNonTargetsCumulative=549
        NumberOfTargetsCumulative=202
        PercentageOfRecordsCumulative=0.9004796
        TargetDensity=0.060240965
        TargetDensityCumulative=0.2689747
    LiftResultElement[9]:
        Lift=0.14704673
        LiftCumulative=1.0
        NumberOfNonTargetsCumulative=629
        NumberOfTargetsCumulative=205
        PercentageOfRecordsCumulative=1.0
        TargetDensity=0.036144577
        TargetDensityCumulative=0.24580336
Logout Phase:
    Completed MiningServer logout
End: Tue Aug 27 16:25:34 EDT 2002
******************************************************

In the listing above, there is a section called Confusion Matrix (it is in bold). The Confusion Matrix indicates how good the model is at avoiding false positives (that is, a prediction of 1 when the actual value is 0) and false negatives (that is, a prediction of 0 when the actual value is 1). This is very important in applications such as predicting cancer (false positive being OK), credit ratings (false positive being BAD) row index is actual value, column index is predicted value. For example, in 81 cases the model predicted a value of 0 where the actual value was 1. If you were to map out the values in a matrix diagram, it may look as follows:

 

3.

You can also view the results of the lift in SQL*Plus. Execute the following:

connect odm/odm@orcl.world
spool liftresult.txt
@lift_results
spool off

 

4.

You now want to analyze the data. To build a chart to depict the data, you will load the liftresults.txt file into Excel and build the chart. Open Excel and create a new workbook. Then select Data > Get External Data > Import Text File.

 

5.

Find the liftresults.txt in your working directory (d:\wkdir) and click Open. A wizard window will appear. Click the Fixed Width radio button and change the Start import at row to 3 and click Next.

 

6.

Click Finish to import the data. Then click OK to add the data to the existing worksheet.

 

7.

To create the chart, select Insert > Chart. Select the Line Chart type and click Next.

 

8.

Drag select the data in the PCT_TGT_CUM column and select the Series tab.

 

9.

Specify Percent of Positive Targets in the Name field and click Next.

 

10.

Enter Number of Records for the X axis and Percent of Positive Targets for the Y axis and click Finish.

 

11.

The chart will appear. What this chart shows is that you can find 29% of likely buyers by soliciting only 10% of your customer database.

 

Compile and Execute Sample_NaiveBayesApply

Now that you are convinced the model is going to help your campaign, you will apply the model to the real data. Perform the following:

1.

From your DOS window execute the following:

compileSampleCode Sample_NaiveBayesApply.java
executeSampleCode Sample_NaiveBayesApply

Note: The above names are case sensitive. Also, if you exited the DOS window you were previously using, you will have to reset the ORACLE_HOME and JAVA_HOME.

 

2.

Review the output as follows:

***********************************************
ODM 9.2.0 Sample Program: SampleNaiveBayesApply
***********************************************
Property Files:
    Global: Sample_Global.property
    Sample: Sample_NaiveBayesApply.property
***********************************************
Start: Tue Aug 27 16:32:28 EDT 2002
Initialization Phase:
    Data Mining Server:
        JDBC URL: jdbc:oracle:thin:@meyoung-lap:1521:orcl
        Username: odm
Login Phase:
    Completed MiningServer login
Cleanup Phase:
    Cleanup flag: cleanupBeforeExecution
Data Setup Creation Phase:
    Created NonTransactional PDS
    Input schema: odm_mtr
    Input table: CENSUS_2D_APPLY_UNBINNED
MiningApplyOutput Creation Phase:
    Added source attributes
    Added targetProbability attributes
    Created MiningApplyOutput
Model Apply Task Phase:
    Invoking NaiveBayes Model apply.
    Status: SUCCESS
    Duration: 7 seconds
Display Apply Table Result Phase:
Apply output result table name: CENSUS_NB_APPLY_RESULT
Apply output result table schema name: odm_mtr
Logout Phase:
    Completed MiningServer logout
End: Tue Aug 27 16:33:03 EDT 2002
***********************************************

 

3.

You can also view the results in SQL*Plus. Execute the following:

connect odm_mtr/odm_mtr@orcl.world 
column id format 9999
select * from census_nb_apply_result 
  order by myprediction desc, myprobability desc;

MYPREDICTION 1 means a positive target is predicted
MYPROBABILITY Indicates confidence of the prediction
MYRANK

1 indicates that this prediction has the highest confidence for this case. An alternate display technique shows all possible predictions for a case.

ID The Case Identifier

 

 

Perform a Market Basket Analysis using Association Rules

Association Rules are used to predict relationships. Association Rules are used to perform market basket analysis. Perform the following steps:

1.

Review Parameters in the Property file

2. Compile and Execute Sample_AssociationRules

Review Parameters in the Property File

1.

Edit the Sample_AssociationRules.property file in wordpad. If this file does not exist, open Sample_AssociationRules_Transactional.property and save it as Sample_AssociationRules.property. The file is contained in the <ORACLE_HOME>\dm\demo\sample directory. Review the following parameters:

associationRules.minimumSupport=0.01
associationRules.minimumConfidence=0
associationRules.maximumRuleLength=2

minimumSupport is a value between 0 and 1 that specifies a threshold for the frequency of the components of a rule occurring together. For example, if the rule is "A implies B" then the number of cases in which A and B occur simultaneously are counted, as a percentage of all cases. Itemsets with Support below the minimum are eliminated, making the calculations faster.

minimumConfidence specifies a threshold for the conditional probability for a rule. The number of cases containing A and B together are counted, as a percentage of cases containing A. Rules with Confidence below the threshold are eliminated, making the calculations faster.

maximumRuleLength specifies the number of components in a rule. The default setting in the sample property files is 2, forcing all rules to be of the type "A implies B", rather than "A and B imply C", or longer.

You may want to alter the value of some of these parameters to see the affect on the result.

 

Compile and Execute Sample_AssociationRules

1.

From your DOS window execute the following:

compileSampleCode Sample_AssociationRules.java
executeSampleCode Sample_AssociationRules
 

2.

Review the output as follows:

*************************************************
ODM 9.2.0 Sample Program: Sample_AssociationRules
*************************************************
Property Files:
    Global: Sample_Global.property
    Sample: Sample_AssociationRules.property
*************************************************
Start: Tue Aug 27 17:04:53 EDT 2002
Initialization Phase:
    Data Mining Server:
        JDBC URL: jdbc:oracle:thin:@meyoung-lap:1521:orcl
        Username: odm
Login Phase:
    Completed MiningServer login
Cleanup Phase:
    Cleanup flag: cleanupBeforeExecution
Data Setup Creation Phase:
    Created Transactional PDS
        Sequence id: SEQUENCE_ID
        Attribute name: ATTRIBUTE_NAME
        Value: VALUE
    Input schema: odm_mtr
    Input table: MARKET_BASKET_TX_BINNED
Mining Settings Creation Phase:
    Persisted MiningFunctionSettings
    Name: Sample_AR_MFS_tx
Model Build Task Phase:
    Invoking AssociatedRule Model build.
    Built AssociatedRule Model
    Name: Sample_AR_Model_tx
    Status: SUCCESS
    Duration: 12 seconds
Display Result Phase:
    Getting top 5 rules for model: Sample_AR_Model_tx sorted by support.
        Rule 108: If TV_DINNER=1 then CANNED_GOODS=1 [support: 0.173, confidence: 0.57284766]
        Rule 60: If CANNED_GOODS=1 then TV_DINNER=1 [support: 0.173, confidence: 0.5709571]
        Rule 6: If BEER=1 then TV_DINNER=1 [support: 0.17, confidence: 0.5802048]
        Rule 62: If TV_DINNER=1 then BEER=1 [support: 0.17, confidence: 0.5629139]
        Rule 42: If BEER=1 then CANNED_GOODS=1 [support: 0.167, confidence: 0.5699659]
    Get rules by support: Sample_AR_Model_tx, with minimum support of 0.16.
        Rule 108: If TV_DINNER=1 then CANNED_GOODS=1 [support: 0.173, confidence: 0.57284766]
        Rule 60: If CANNED_GOODS=1 then TV_DINNER=1 [support: 0.173, confidence: 0.5709571]
        Rule 6: If BEER=1 then TV_DINNER=1 [support: 0.17, confidence: 0.5802048]
        Rule 62: If TV_DINNER=1 then BEER=1 [support: 0.17, confidence: 0.5629139]
        Rule 42: If BEER=1 then CANNED_GOODS=1 [support: 0.167, confidence: 0.5699659]
        Rule 41: If CANNED_GOODS=1 then BEER=1 [support: 0.167, confidence: 0.5511551]
    Get rules by confidence: Sample_AR_Model_tx, with confidence of 0.56 or more.
        Rule 6: If BEER=1 then TV_DINNER=1 [support: 0.17, confidence: 0.5802048]
        Rule 108: If TV_DINNER=1 then CANNED_GOODS=1 [support: 0.173, confidence: 0.57284766]
        Rule 60: If CANNED_GOODS=1 then TV_DINNER=1 [support: 0.173, confidence: 0.5709571]
        Rule 42: If BEER=1 then CANNED_GOODS=1 [support: 0.167, confidence: 0.5699659]
        Rule 62: If TV_DINNER=1 then BEER=1 [support: 0.17, confidence: 0.5629139]
Logout Phase:
    Completed MiningServer logout
End: Tue Aug 27 17:05:31 EDT 2002
*************************************************

Results are displayed using three different criteria. Support is an indication of the number of times cases were observed together. Confidence indicates the conditional probability - in 0.562 cases of TV_dinner being purchased, beer was also purchased.

 

Module Summary

In this module, you should have learned how to:

Use the Naive Bayes model to predict the value of a designated variable

Use association rules to predict relationships



Copyright © 2002 Oracle Corporation. All Rights Reserved.

Close Window

E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy