As Published In
Oracle Magazine
March/April 2006

TECHNOLOGY: Business Intelligence


Learn and Predict

By Ron Hardman Oracle ACE

Explain your data and forecast results, using predictive analytics.

We predict potential outcomes based on actions every day. If I speed in my car, I have a much higher likelihood of being pulled over by the police than if I observe the speed limit. Choosing a burger and fries every day at lunch rather than a more nutritious meal is likely to lead to poor health over time. Each choice and potential outcome are part of our life's data set. The amount of data is relatively small, and the potential outcomes are well-known. We don't give the decision process much thought.

Many data sets we collect are not as easily analyzed, though. Large data sets such as those related to housing, education, security, taxation, sales, and even sports are too large to analyze by hand. Not all patterns are easily recognized; many contain seemingly unrelated data.

What if it were possible to know the factors that lead to certain outcomes and predict with confidence the impact of decisions? This is what predictive analytics is about.

Overview

This article examines three implementations of the predictive analytics feature in Oracle Data Mining and demonstrates them against a sample data set.

The objective of this analysis and demonstration of predictive analytics is to find out which attributes in the sample data set are related to the result and the degree to which they influence the final outcome.

Note that the steps in this article require the Oracle Data Mining option. To verify that this option is installed, run the following SELECT statement: 

SELECT parameter, value
  FROM v$option
  WHERE parameter = 'Data Mining';


This must return TRUE.

Predictive Analytics

Data mining is about finding patterns in data and reporting these patterns to the user, who can then use that information to predict future events. Oracle Database 10g Release 2 adds a new PL/SQL package called DBMS_PREDICTIVE_ANALYTICS. The Oracle Data Miner application includes a one-click interface to this package. The predictive analytics spreadsheet add-in provides another interface.

The predictive analytics data mining feature analyzes data, finds patterns, and predicts outcomes or results with an associated level of confidence. The confidence is important, because it quantifies the certainty of the predictions. The range of values is 0 to 1. The higher the confidence value, the more certain the prediction. Getting started with the sample data. The example in this article, including the data set, is available for download. The ZIP file is called DogsPlayingPoker.zip. To walk through the steps, you need to create a schema; run the tables.sql script from the ZIP file; and load the data, using SQL*Loader with the following syntax: 

sqlldr <<username>>/<<password>> control=loader.ctl


Replace <<username>> and <<password>> with your own, and add the directory path to the control file, if necessary.

When the CANINE_POKER_RESULTS table is successfully loaded, it will contain 5,916 records.

Command-line implementation. The DBMS_PREDICTIVE_ANALYTICS package is new in Oracle Database 10g Release 2. The package definition is shown in Listing 1.

Listing 1: Describing the DBMS_PREDICTIVE_ANALYTICS package 

DESC DBMS_PREDICTIVE_ANALYTICS

PROCEDURE EXPLAIN
 Argument Name          Type          In/Out    Default?
 ---------------        -----------   ----      -----
 DATA_TABLE_NAME        VARCHAR2      IN
 EXPLAIN_COLUMN_NAME    VARCHAR2      IN
 RESULT_TABLE_NAME      VARCHAR2      IN
 DATA_SCHEMA_NAME       VARCHAR2      IN        DEFAULT

PROCEDURE PREDICT
 Argument Name          Type          In/Out    Default?
 ---------------        -----------   ----      -----
 ACCURACY               NUMBER        OUT
 DATA_TABLE_NAME        VARCHAR2      IN
 CASE_ID_COLUMN_NAME    VARCHAR2      IN
 TARGET_COLUMN_NAME     VARCHAR2      IN
 RESULT_TABLE_NAME      VARCHAR2      IN
 DATA_SCHEMA_NAME       VARCHAR2      IN        DEFAULT


The EXPLAIN procedure in the DBMS_PREDICTIVE_ANALYTICS package analyzes the data set to determine the explanatory value of each attribute in the data set. The higher the explanatory value for an attribute, the stronger the relationship between it and the result.

In this article, the EXPLAIN procedure reads the data from the CANINE_POKER_RESULTS table, creates the EXPLAIN_RESULTS table, and writes the results from the procedure to it. The following anonymous block executes the EXPLAIN procedure: 

BEGIN
  DBMS_PREDICTIVE_ANALYTICS.EXPLAIN (
     DATA_TABLE_NAME        => 
              'CANINE_POKER_RESULTS',
     EXPLAIN_COLUMN_NAME => 
                            'FINISH',
     RESULT_TABLE_NAME     => 
                   'EXPLAIN_RESULTS');
END;
/


Listing 2 selects the results from the EXPLAIN_RESULTS table.

Listing 2: Viewing the EXPLAIN Results 

SET PAGES 9999
SELECT *
  FROM EXPLAIN_RESULTS;

ATTRIBUTE_NAME          EXPLANATORY_VALUE       RANK
--------------          -----------             -----
FOOD_TYPE                .269033829             1 
GAME                     .261283299             2
SHAGGY_EYEBROWS          .030682642             3
AGE                     0                       4
CASE_ID                 0                       4
BREED                   0                       4


The three attributes in Listing 2 that have a positive EXPLANATORY_VALUE are the attributes that can be used to predict the success each dog will have in a poker game. The type of food (FOOD_TYPE) consumed by the dog is most closely aligned with the result, followed closely by the type of game played (GAME). Shaggy eyebrows (SHAGGY_EYEBROWS) come in a distant third. CASE_ID is the unique identifier for each record, so there is no correlation between it and the result. The analysis also shows an EXPLANATORY_VALUE of 0 for both AGE and BREED, so there is no pattern that would allow them to be used in the prediction.

The code in Listing 3 predicts the probability of winning a game, using the PREDICT procedure.

Listing 3: Running PREDICT 

SET SERVEROUTPUT ON
DECLARE
   v_predict_accuracy NUMBER(30,10);
BEGIN
   DBMS_PREDICTIVE_ANALYTICS.PREDICT (
       ACCURACY                        => v_predict_accuracy,
       DATA_TABLE_NAME            => 'CANINE_POKER_RESULTS',
       CASE_ID_COLUMN_NAME    => 'CASE_ID',
       TARGET_COLUMN_NAME      => 'FINISH',
       RESULT_TABLE_NAME         => 'PREDICT_RESULTS');
      
   DBMS_OUTPUT.PUT_LINE('*** Accuracy ***');
   DBMS_OUTPUT.PUT_LINE(v_predict_accuracy);
END;
/

*** Accuracy ***
.5666155894


Listing 3 shows that I can expect a better-than-average—57 percent (for the value .5666155894)—ability to predict the outcome based on the three attributes identified with the EXPLAIN procedure. Using predictive analytics increases the accuracy of predicting the winner by 57 percent over the accuracy of using the average of the target value (the FINISH column).

Oracle Data Miner implementation. Note: Oracle Data Miner 10.2's graphical user interface is not included with Oracle Database 10g Release 2. You can download it from oracle.com/technology/products/bi/odm.

You can use the Oracle Data Miner user interface to quickly and easily arrive at the same results as with the command-line implementation. To use Oracle Data Miner to explain the data, perform the following steps: 

  1. Start Oracle Data Miner 10.2, by launching odminerw.exe from your install directory.

  2. Create a new connection to the schema used in the command-line implementation, and click OK to connect.

  3. Select <<schema_name>> -> Data Sources -> Tables , and click the table named CANINE_POKER_RESULTS .

  4. Select Data -> Explain to launch the Predictive Analytics Wizard.

  5. Verify that the schema and table name are correct.

  6. Provide the target attribute (the FINISH column) and the target table (leave the default).

  7. Click Finish to begin analyzing the data.

The target table is created and populated with the EXPLAIN results.

To see the results, press the F5 key on your keyboard to refresh Oracle Data Miner and find the target table you created. The Data tab displays the results, as shown in Figure 1.

 

figure 1
Figure 1: Oracle Data Miner Explain


Predict is done in similar fashion. Select Data -> Predict , and follow the instructions provided in the wizard. Once finished, press F5 to refresh the interface and select the results table that was generated. The Data tab, shown in Figure 2, shows a record-by-record account of the prediction.

 

figure 2
Figure 2: Oracle Data Miner Predict


The 0s and 1s in the PREDICTION column are the predicted values for the FINISH column in the original data set. A value of 0 indicates a predicted loss, and a value of 1 means a predicted win.

Spreadsheet add-in implementation. The third implementation uses a spreadsheet add-in.

The spreadsheet add-in comes as a ZIP file that includes complete installation instructions, and the process is quite simple. The add-in itself is the Predictive_Analytics.xla file. To add it to the Microsoft Excel user interface, select Tools -> Add-Ins in Excel and click Browse . Select the Predictive_Analytics.xla file, click OK , select the Oracle Predictive Analytics check box, and click OK .

With the add-in selected, a new menu—OraclePA—becomes available. Figure 3 shows the menu.

 

figure 3
Figure 3: OraclePA menu


The spreadsheet add-in provides the ability to connect to the tables in the database for analysis or to analyze data directly in the spreadsheet.

To explain the data and predict the results by using the spreadsheet add-in, open the DogsPlayingPoker_Results.xls file (included in the DogsPlayingPoker.zip file) in Excel. Select OraclePA , and click Connect to create a new connection to your database.

To use Explain, perform the following steps: 

  1. Press ctrl-A to select all the data in the spreadsheet. (Note that it is possible to select a subset of the spreadsheet data to analyze, but in this case, include all of it.)

  2. Select OraclePA -> Explain.

  3. In the Explain dialog box, shown in Figure 4, select Excel as the datasource.

  4. Select $A$1:$G$5908 as the data for operation.

  5. Select 'CANINE_POKER_RESULTS'!$G:$G as the column to be explained.

  6. Click OK .

  7. In the Returning External Data to Excel dialog box, shown in Figure 5, select New worksheet .

  8. Click OK .

 

figure 4
Figure 4: Selecting data to explain


 

figure 5
Figure 5: Returning the data


The Explain results are added to a new worksheet in your spreadsheet. Once the data is in the spreadsheet, it is easy to graph the results. Highlight A1 through C4, and select Insert -> Chart .

Next Steps


 READ more about predictive analytics

DOWNLOAD
Oracle Data Miner
DBMS_PREDICTIVE_ANALYTICS spreadsheet add-in
sample data for this article

The steps for using Predict in the spreadsheet are similar to those required to use Predict in Oracle Data Miner.

To use Predict, perform the following steps: 

  1. Click the Canine Poker Results worksheet.

  2. Press ctrl-A to select all records.

  3. Select OraclePA -> Predict .

  4. In the Predict dialog box, shown in Figure 6, select Excel as the datasource.

  5. Select $A$1:$G$5908 as the data for operation.

  6. Select 'CANINE_POKER_RESULTS'!$A:$A as the case id column.

  7. Select 'CANINE_POKER_RESULTS'!$G:$G as the column to be predicted.

  8. Click OK .

  9. In the Returning External Data to Excel dialog box, select New worksheet .

  10. Click OK .

 

figure 6
Figure 6: Selecting data to predict


The Predict results are added to a new worksheet in your spreadsheet.

Figure 7, shows the Predict results. The overall predictive confidence is .57. This is the same value returned by the DBMS_PREDICTIVE_ANALYTICS.PREDICT procedure earlier in this article.

 

figure 7
Figure 7: Results of Predict


Adding New Records

The information gleaned from the Dogs Playing Poker data set shows that breed and age have no bearing on whether the entrant is a winner or a loser. The diet is the most critical, followed closely by the game being played. To illustrate, add the information in Listing 4 as the last two rows of the DogsPlayingPoker_Results.xls spreadsheet.

Code Listing 4: New data for spreadsheet implementation

 

5908    Standard Poodle         12      CANNED          YES     5 CARD DRAW
5909    Yorkshire Terrier       5       TABLE SCRAPS    YES     BLACK JACK


Rerun Predict, and scroll to the bottom of the new worksheet. Figure 8 shows the result.

 

figure 7
Figure 8: Predicting the outcome from the new information


So, the first entrant—5908, which eats canned dog food and plays five-card draw—is projected to win (value of 1) with 95 percent (0.953579664) certainty. The second entrant—5909, which eats table scraps and plays blackjack—is projected to lose with 98 percent certainty.

Conclusion

How much better is it to take action when you know the likely outcome, instead of simply hoping for the best and reporting results afterward?

Although the sample data in this article is not based on enterprise business data, it shows that predictive analytics really doesn't have any bounds, as long as the data is accurate and doesn't lack any major attributes. Predictive analytics has even made its way to television. Characters in the television show Numbers use math and statistics to solve crimes. Data mining, with an emphasis on predicting future actions based on patterns, is frequently used. Although the example in this article is completely fictional, predictive analytics is not. 


Ron Hardman works with Academy District 20 schools in Colorado Springs, Colorado, and is the founder of Peak Retrieval, LLC, a firm specializing in information retrieval technologies. He is coauthor of Oracle Database 10g PL/SQL Programming and Expert PL/SQL, both from Oracle Press, and is an Oracle ACE.

 



Send us your comments