| |
Using Oracle Data Mining to Predict Data Behavior
Module Objectives
Purpose
This module focuses on how to use the Data Mining capability of the Oracle9i
Database.
Objectives
After completing this module, you should be able to:
 |
Use the Naive Bayes model to predict the value of a designated
variable
|
 |
Use association rules to predict relationships
|
Prerequisites
Before starting this module, you should have completed the following:
Reference Material
The following is useful reference material if you want additional information
about the topics in this module:
 |
Documentation: Oracle9i
Data Mining Concepts
|
Overview
Oracle9i Data Mining (ODM) provides the means to find previously hidden
patterns and relationships in data to enable better decision-making; ODM
uses past data to predict the future. For example, you can develop applications
using data mining to gain customer insight. By mining data that has been
consolidated into one unified view, you can build detailed profiles of
your best customers and then find other customers who fit this profile.
With ODM, you can gain greater insight into customers needs and
then provide them with the right products and services.
With ODM, companies can tap information hidden in their corporate databases
to reveal new insights about their customers and their businesses. ODM
can also detect hidden patterns in life science, government, manufacturing,
and other applications, such as:
Finding target genes and promising leads
to attack diseases
Predicting the quality of a manufactured part
Finding associations between patients, drugs, and outcomes
Identifying possible network intrusions.
Data Mining Techniques
Data mining can be separated into supervised and unsupervised learning
techniques. Supervised learning requires the data analyst to identify
a dependent variable. This type of data mining technique builds models
that predict the probability of specific outcomes. It can be used to answer
questions such as who is likely to buy a mutual fund in the next
6 months and why? ODM provides the Naïve Bayes (NB) and Adaptive
Bayes Network (ABN) data mining algorithms for supervised learning types
of problems. Attribute Importance (for ranking the predictive power of
attributes) and ABN are new in ODM Release 2.
In unsupervised learning, the user does not indicate the objective to
the data mining algorithm. This type of data mining technique can be used
to identify co-occurring items or events. It can be used to answer questions
such as what are popular product bundles? ODM provides Association
Rules and two distinct clustering algorithms for unsupervised learning
types of problems. The Clustering unsupervised learning function is new
in ODM Release 2.
This lesson illustrates two of these algorithms, Naive Bayes and Association
Rules.
Data Mining in the Database
ODM simplifies the process of extracting business intelligence from large
amounts of data. It eliminates off-loading vast quantities of data to
external special-purpose analytic servers for data mining and scoring.
With ODM, all the data mining functionality is embedded in the Oracle9i
Database, so the data, data preparation, model building, and model scoring
activities remain in the database.
With all phases of data mining performed within the database, each data
mining phase results in significant improvements in productivity, automation,
and integration. These improvements are notable in data preparation, which
often can constitute as much as 80% of the data mining process.
Data Mining for Java Wizard in JDeveloper
Oracle will be introducing a new set of wizards in JDeveloper, available
from OTN in January, that will automatically build the Java components
required to perform each of the tasks in a data mining project. The Java
code will be constructed from the source data and parameters specified
by the user in the JDeveloper User Interface.
To see a glimpse, click the show me button below.

Set Up Data Mining Environment
You will need to perform the following tasks to set up the data mining
environment:
Unlock the Data Mining Accounts
| 1. |
From a SQL*Plus session logged on as SYS, enter the following:
alter user odm account unlock;
alter user odm_mtr account unlock;
|
Start the Oracle Data Mining Task Monitor
| 1. |
From a SQL*Plus session, execute the following:
connect odm/odm@orcl.world
exec odm_start_monitor
|
Edit the Database Connection Informatioin in Sample_Global.property
| 1. |
Open the Sample_Global.property file in wordpad. This file
is located in <ORACLE_HOME>\dm\demo\sample
|
| 2. |
Modify the following lines with your host and username information:
miningServer.url=jdbc:oracle:thin:@<hostname>:1521:orcl
miningServer.username=odm
miningServer.password=odm

Note: leave inputDataSchemaName and outputDataSchemaName to odm_mtr
When done, save changes and close the file.
|
Verify JAVA 1.3.1 Environment
| 1. |
Open a DOS window and execute the following commands:
java -version

|
Using Naive Bayes Model to Predict Targets
Naive Bayes is a supervised model used to predict the value of a designated
variable, called a target (in this case the MYPREDICTION variable which
is an indication of the income level of an individual), together with
the confidence associated with each prediction. Perform the following
steps:
Compile and Execute Sample_NaiveBayesBuild
The first step is to compile and execute the application. Perform the
following:
| 1. |
Open a DOS window and change directory to <ORACLE_HOME>\dm\demo\sample
and execute the following:
set ORACLE_HOME=d:\oracle\ora92
set JAVA_HOME=d:\oracle\ora92\jdk
compileSampleCode Sample_NaiveBayesBuild.java
executeSampleCode Sample_NaiveBayesBuild
Note: The above names are case sensitive.
|
| 2. |
Review the output as follows:
***********************************************
ODM 9.2.0 Sample Program: SampleNaiveBayesBuild
***********************************************
Property Files:
Global: Sample_Global.property
Sample: Sample_NaiveBayesBuild.property
***********************************************
Start: Tue Aug 27 16:34:37 EDT 2002
Initialization Phase:
Data Mining Server:
JDBC URL: jdbc:oracle:thin:@meyoung-lap:1521:orcl
Username: odm
Login Phase:
Completed MiningServer login
Cleanup Phase:
Cleanup flag: cleanupBeforeExecution
Removed function settings object: Sample_NB_MFS
Removed mining model object: Sample_NB_Model
Removed mining task: Sample_NB_Build_Task
Data Setup Creation Phase:
Created NonTransactional PDS
Input schema: odm_mtr
Input table: CENSUS_2D_BUILD_UNBINNED
Mining Settings Creation Phase:
Added supplemental attribute(s)
Persisted MiningFunctionSettings
Name: Sample_NB_MFS
Model Build Task Phase:
Invoking NaiveBayes Model build.
Built NaiveBayes Model
Name: Sample_NB_Model
Status: SUCCESS
Duration: 13 seconds
Logout Phase:
Completed MiningServer logout
End: Tue Aug 27 16:35:13 EDT 2002
***********************************************
The input table is sample census data which is unbinned (not categorized).
Oracle Data Mining will perform automatic binning. In addition,
the program has created a model that can use demographic information
to predict the income level of an individual (0=low, 1=High).
|
| 3. |
You can optionally check the progress of the build by querying
the odm_mining-task_state table. Open a SQL*Plus session and execute
the following:
connect odm/odm@orcl.world
column id format 999
column entry_timestamp format a30
column state format a15
select id, entry_timestamp, state from odm_mining_task_state;

|
Compile and Execute Sample_NaiveBayesLiftandTest
To measure the effectiveness of the model created in the Build, the Lift
and Test program applies the model to another subset of the census data;
the predicted values can then be compared to the actual Income Level values
of each case in the input data. Perform the following:
| 1. |
From your DOS window execute the following:
compileSampleCode Sample_NaiveBayesLiftAndTest.java
executeSampleCode Sample_NaiveBayesLiftAndTest
Note: The above names are case sensitive. Also, if you exited
the DOS window you were previously using, you will have to reset
the ORACLE_HOME and JAVA_HOME.
|
| 2. |
Review the output as follows:
******************************************************
ODM 9.2.0 Sample Program: Sample_NaiveBayesLiftAndTest
******************************************************
Property Files:
Global: Sample_Global.property
Sample: Sample_NaiveBayesLiftAndTest.property
******************************************************
Start: Tue Aug 27 16:24:29 EDT 2002
Initialization Phase:
Data Mining Server:
JDBC URL: jdbc:oracle:thin:@meyoung-lap:1521:orcl
Username: odm
Login Phase:
Completed MiningServer login
Cleanup Phase:
Cleanup flag: cleanupBeforeExecution
Data Setup Creation Phase:
Created NonTransactional PDS
Input schema: odm_mtr
Input table: CENSUS_2D_TEST_UNBINNED
Model Test Task Phase:
Invoking NaiveBayes Model test.
Status: SUCCESS
Result: Sample_NB_TEST
Duration: 7 seconds
Display Result Phase:
Test Result Id: 1.0
Accuracy: 0.8033573
Confusion Matrix:
Row Index: 0 Column Index: 0 Value: 546.0
Row Index: 0 Column Index: 1 Value: 83.0
Row Index: 1 Column Index: 0 Value: 81.0
Row Index: 1 Column Index: 1 Value: 124.0
Model Compute Lift Task Phase:
Invoking NaiveBayes Model computeLift
Status: SUCCESS
Result: Sample_NB_LIFT_RESULT
Duration: 4 seconds
Display Lift Result Phase:
LiftResultElement[0]:
Lift=2.9543555
LiftCumulative=2.9543555
NumberOfNonTargetsCumulative=23
NumberOfTargetsCumulative=61
PercentageOfRecordsCumulative=0.10071942
TargetDensity=0.72619045
TargetDensityCumulative=0.72619045
LiftResultElement[1]:
Lift=1.8888502
LiftCumulative=2.4216027
NumberOfNonTargetsCumulative=68
NumberOfTargetsCumulative=100
PercentageOfRecordsCumulative=0.20143884
TargetDensity=0.4642857
TargetDensityCumulative=0.5952381
LiftResultElement[2]:
Lift=1.7629268
LiftCumulative=2.2020442
NumberOfNonTargetsCumulative=115
NumberOfTargetsCumulative=136
PercentageOfRecordsCumulative=0.30215827
TargetDensity=0.43333334
TargetDensityCumulative=0.54126984
LiftResultElement[3]:
Lift=1.0461324
LiftCumulative=1.9130661
NumberOfNonTargetsCumulative=178
NumberOfTargetsCumulative=158
PercentageOfRecordsCumulative=0.4028777
TargetDensity=0.25714287
TargetDensityCumulative=0.4702381
LiftResultElement[4]:
Lift=0.7842492
LiftCumulative=1.689458
NumberOfNonTargetsCumulative=245
NumberOfTargetsCumulative=174
PercentageOfRecordsCumulative=0.5023981
TargetDensity=0.19277108
TargetDensityCumulative=0.41527447
LiftResultElement[5]:
Lift=0.7842492
LiftCumulative=1.5397921
NumberOfNonTargetsCumulative=312
NumberOfTargetsCumulative=190
PercentageOfRecordsCumulative=0.60191846
TargetDensity=0.19277108
TargetDensityCumulative=0.37848607
LiftResultElement[6]:
Lift=0.09803115
LiftCumulative=1.3352345
NumberOfNonTargetsCumulative=393
NumberOfTargetsCumulative=192
PercentageOfRecordsCumulative=0.70143884
TargetDensity=0.024096385
TargetDensityCumulative=0.32820514
LiftResultElement[7]:
Lift=0.24507788
LiftCumulative=1.199781
NumberOfNonTargetsCumulative=471
NumberOfTargetsCumulative=197
PercentageOfRecordsCumulative=0.8009592
TargetDensity=0.060240965
TargetDensityCumulative=0.2949102
LiftResultElement[8]:
Lift=0.24507788
LiftCumulative=1.0942678
NumberOfNonTargetsCumulative=549
NumberOfTargetsCumulative=202
PercentageOfRecordsCumulative=0.9004796
TargetDensity=0.060240965
TargetDensityCumulative=0.2689747
LiftResultElement[9]:
Lift=0.14704673
LiftCumulative=1.0
NumberOfNonTargetsCumulative=629
NumberOfTargetsCumulative=205
PercentageOfRecordsCumulative=1.0
TargetDensity=0.036144577
TargetDensityCumulative=0.24580336
Logout Phase:
Completed MiningServer logout
End: Tue Aug 27 16:25:34 EDT 2002
******************************************************
In the listing above, there is a section called Confusion Matrix
(it is in bold). The Confusion Matrix indicates how good the model
is at avoiding false positives (that is, a prediction of 1 when
the actual value is 0) and false negatives (that is, a prediction
of 0 when the actual value is 1). This is very important in applications
such as predicting cancer (false positive being OK), credit ratings
(false positive being BAD) row index is actual value, column index
is predicted value. For example, in 81 cases the model predicted
a value of 0 where the actual value was 1. If you were to map out
the values in a matrix diagram, it may look as follows:

|
| 3. |
You can also view the results of the lift in SQL*Plus. Execute
the following:
connect odm/odm@orcl.world
spool liftresult.txt
@lift_results
spool off

|
| 4. |
You now want to analyze the data. To build a chart to depict the
data, you will load the liftresults.txt file into Excel and
build the chart. Open Excel and create a new workbook. Then select
Data > Get External Data > Import Text File.
|
| 5. |
Find the liftresults.txt in your working directory (d:\wkdir) and
click Open. A wizard window will appear. Click the Fixed Width
radio button and change the Start import at row to 3 and
click Next.

|
| 6. |
Click Finish to import the data. Then click OK to
add the data to the existing worksheet.

|
| 7. |
To create the chart, select Insert > Chart. Select the
Line Chart type and click Next.

|
| 8. |
Drag select the data in the PCT_TGT_CUM column and select the Series
tab.

|
| 9. |
Specify Percent of Positive Targets in the Name field and
click Next.

|
| 10. |
Enter Number of Records for the X axis and Percent of
Positive Targets for the Y axis and click Finish.

|
| 11. |
The chart will appear. What this chart shows is that you can find
29% of likely buyers by soliciting only 10% of your customer database.

|
Compile and Execute Sample_NaiveBayesApply
Now that you are convinced the model is going to help your campaign,
you will apply the model to the real data. Perform the following:
| 1. |
From your DOS window execute the following:
compileSampleCode Sample_NaiveBayesApply.java
executeSampleCode Sample_NaiveBayesApply
Note: The above names are case sensitive. Also, if you exited
the DOS window you were previously using, you will have to reset
the ORACLE_HOME and JAVA_HOME.
|
| 2. |
Review the output as follows:
***********************************************
ODM 9.2.0 Sample Program: SampleNaiveBayesApply
***********************************************
Property Files:
Global: Sample_Global.property
Sample: Sample_NaiveBayesApply.property
***********************************************
Start: Tue Aug 27 16:32:28 EDT 2002
Initialization Phase:
Data Mining Server:
JDBC URL: jdbc:oracle:thin:@meyoung-lap:1521:orcl
Username: odm
Login Phase:
Completed MiningServer login
Cleanup Phase:
Cleanup flag: cleanupBeforeExecution
Data Setup Creation Phase:
Created NonTransactional PDS
Input schema: odm_mtr
Input table: CENSUS_2D_APPLY_UNBINNED
MiningApplyOutput Creation Phase:
Added source attributes
Added targetProbability attributes
Created MiningApplyOutput
Model Apply Task Phase:
Invoking NaiveBayes Model apply.
Status: SUCCESS
Duration: 7 seconds
Display Apply Table Result Phase:
Apply output result table name: CENSUS_NB_APPLY_RESULT
Apply output result table schema name: odm_mtr
Logout Phase:
Completed MiningServer logout
End: Tue Aug 27 16:33:03 EDT 2002
***********************************************
|
| 3. |
You can also view the results in SQL*Plus. Execute the following:
connect odm_mtr/odm_mtr@orcl.world
column id format 9999
select * from census_nb_apply_result
order by myprediction desc, myprobability desc;

| MYPREDICTION |
1 means a positive target is predicted |
| MYPROBABILITY |
Indicates confidence of the prediction |
| MYRANK |
1 indicates that this prediction has the highest confidence
for this case. An alternate display technique shows all possible
predictions for a case.
|
| ID |
The Case Identifier |
|
Perform a Market Basket Analysis using Association Rules
Association Rules are used to predict relationships. Association Rules
are used to perform market basket analysis. Perform the following steps:
Review Parameters in the Property File
| 1. |
Edit the Sample_AssociationRules.property file in wordpad.
If this file does not exist, open Sample_AssociationRules_Transactional.property
and save it as Sample_AssociationRules.property. The file
is contained in the <ORACLE_HOME>\dm\demo\sample directory.
Review the following parameters:
associationRules.minimumSupport=0.01
associationRules.minimumConfidence=0
associationRules.maximumRuleLength=2
minimumSupport is
a value between 0 and 1 that specifies a threshold for the frequency
of the components of a rule occurring together. For example, if
the rule is "A implies B" then the number of cases in
which A and B occur simultaneously are counted, as a percentage
of all cases. Itemsets with Support below the minimum are eliminated,
making the calculations faster.
minimumConfidence
specifies a threshold for the conditional probability for a rule.
The number of cases containing A and B together are counted, as
a percentage of cases containing A. Rules with Confidence below
the threshold are eliminated, making the calculations faster.
maximumRuleLength
specifies the number of components in a rule. The default setting
in the sample property files is 2, forcing all rules to be of the
type "A implies B", rather than "A and B imply C",
or longer.
You may want to alter the value of some of these parameters to
see the affect on the result.
|
Compile and Execute Sample_AssociationRules
| 1. |
From your DOS window execute the following:
compileSampleCode Sample_AssociationRules.java
executeSampleCode Sample_AssociationRules
|
| 2. |
Review the output as follows:
*************************************************
ODM 9.2.0 Sample Program: Sample_AssociationRules
*************************************************
Property Files:
Global: Sample_Global.property
Sample: Sample_AssociationRules.property
*************************************************
Start: Tue Aug 27 17:04:53 EDT 2002
Initialization Phase:
Data Mining Server:
JDBC URL: jdbc:oracle:thin:@meyoung-lap:1521:orcl
Username: odm
Login Phase:
Completed MiningServer login
Cleanup Phase:
Cleanup flag: cleanupBeforeExecution
Data Setup Creation Phase:
Created Transactional PDS
Sequence id: SEQUENCE_ID
Attribute name: ATTRIBUTE_NAME
Value: VALUE
Input schema: odm_mtr
Input table: MARKET_BASKET_TX_BINNED
Mining Settings Creation Phase:
Persisted MiningFunctionSettings
Name: Sample_AR_MFS_tx
Model Build Task Phase:
Invoking AssociatedRule Model build.
Built AssociatedRule Model
Name: Sample_AR_Model_tx
Status: SUCCESS
Duration: 12 seconds
Display Result Phase:
Getting top 5 rules for model: Sample_AR_Model_tx sorted by support.
Rule 108: If TV_DINNER=1 then CANNED_GOODS=1 [support: 0.173, confidence: 0.57284766]
Rule 60: If CANNED_GOODS=1 then TV_DINNER=1 [support: 0.173, confidence: 0.5709571]
Rule 6: If BEER=1 then TV_DINNER=1 [support: 0.17, confidence: 0.5802048]
Rule 62: If TV_DINNER=1 then BEER=1 [support: 0.17, confidence: 0.5629139]
Rule 42: If BEER=1 then CANNED_GOODS=1 [support: 0.167, confidence: 0.5699659]
Get rules by support: Sample_AR_Model_tx, with minimum support of 0.16.
Rule 108: If TV_DINNER=1 then CANNED_GOODS=1 [support: 0.173, confidence: 0.57284766]
Rule 60: If CANNED_GOODS=1 then TV_DINNER=1 [support: 0.173, confidence: 0.5709571]
Rule 6: If BEER=1 then TV_DINNER=1 [support: 0.17, confidence: 0.5802048]
Rule 62: If TV_DINNER=1 then BEER=1 [support: 0.17, confidence: 0.5629139]
Rule 42: If BEER=1 then CANNED_GOODS=1 [support: 0.167, confidence: 0.5699659]
Rule 41: If CANNED_GOODS=1 then BEER=1 [support: 0.167, confidence: 0.5511551]
Get rules by confidence: Sample_AR_Model_tx, with confidence of 0.56 or more.
Rule 6: If BEER=1 then TV_DINNER=1 [support: 0.17, confidence: 0.5802048]
Rule 108: If TV_DINNER=1 then CANNED_GOODS=1 [support: 0.173, confidence: 0.57284766]
Rule 60: If CANNED_GOODS=1 then TV_DINNER=1 [support: 0.173, confidence: 0.5709571]
Rule 42: If BEER=1 then CANNED_GOODS=1 [support: 0.167, confidence: 0.5699659]
Rule 62: If TV_DINNER=1 then BEER=1 [support: 0.17, confidence: 0.5629139]
Logout Phase:
Completed MiningServer logout
End: Tue Aug 27 17:05:31 EDT 2002
*************************************************
Results are displayed using three different criteria. Support is
an indication of the number of times cases were observed together.
Confidence indicates the conditional probability - in 0.562 cases
of TV_dinner being purchased, beer was also purchased.
|
Module Summary
In this module, you should have learned how to:
 |
Use the Naive Bayes model to predict the value of a designated
variable
|
 |
Use association rules to predict relationships
|
Copyright © 2002 Oracle Corporation. All Rights Reserved.
Close Window
|