| Developer: BI
Using the Oracle Data Mining API
by Agust Egilsson
Learn how to safely and directly let end users access Oracle's analytic
APIs.
Published August 2006
Following best practices, Oracle releases PL/SQL and Java application-programming
interfaces (APIs) well before there is a point-and-click tool or builder
available taking advantage of a newly released API. This practice enables
developers to incorporate new functionality into their applications immediately
and allows their organizations to benefit accordingly.
However, considerable development is required to create specialized
tools that take advantage of a newly released API. As a consequence, the
lag time following the release and before an intuitive end-user tool utilizing
the API is available is several months in the best case and, more realistically,
one or two years. In the meantime, taking advantage of the API may require
organizations either to come up with their own tools or more commonly
with ad hoc solutions, such as embedding calls to the API in report generators
and spreadsheets.
One common approach is to wrap the API in several scripts and then access
the scripts from a spreadsheet using custom buttons and menus. However,
a major disadvantage with this approach is that today’s spreadsheets
handle text and numbers only as values in individual cells and are therefore
usually not the appropriate tools for dealing with the new functionality.
In this article, I will show how to quickly incorporate analytic and
other APIs into a spreadsheet platform from which the end user can easily
access the new code. As a guiding example, I will explain how I repackage
the Oracle Data Mining (PL/SQL) API as a Java API and how it is then accessed
from a spreadsheet called J Cells. J Cells is written entirely in Oracle
JDeveloper. It is not limited to only allowing text and numbers as values
of cells but instead it also handles Java objects as values and can access
any Java API directly from its cells, for immediate deployment.
The Spreadsheet Platform
I use the spreadsheet interface but allow the user to create any Java
object in the cells as well as to use primitive Java types. Each cell
can be used as a variable in another cell, and the user can choose between
writing Java code directly in the cells and using other formats. Bridging
the spreadsheet interface and the use of objects instead of just numbers
and text in a normal spreadsheet is automatic: J Cells calculates an indication
value for each object that fits into a cell, and this indication value
gives the user a sufficient clue about the object being displayed. Additionally,
a full value system is implemented that displays the object in various
other formats when requested—for example, when the user double-clicks
on a given cell. Since, even in a spreadsheet, formulas can be complicated
to define, the system also recognizes if the object being created has
a wizard associated with it. A wizard is usually a graphical
code generator specific to an object type. The use of wizards in J Cells
will become clear later in our example.
Figure 1 shows the spreadsheet interface for our example.

Figure 1: J Cells’ spreadsheet interface
The Data Mining API
Oracle supports two compatible APIs for accessing data mining functionality
in the database. The first is a PL/SQL API, which includes the DBMS_DATA_MINING
package, and there is also a Java API called Oracle Data Mining Java API.
Since J Cells is currently best suited to access a Java API, the PL/SQL
API needs to be packaged in such a way that it can be accessed directly
from Java. The two principal Oracle Data Mining concepts are settings
and models. The settings concept is basically built around settings
tables with two columns: setting_name and setting_value, where setting_name
is the name of an attribute used by a mining algorithm and setting_value
is the value corresponding to that attribute.
The DBMS_DATA_MINING package contains several procedures, including
CREATE_MODEL and APPLY. The CREATE_MODEL procedure creates a mining model
for a given mining function and dataset, based on the values in the settings
table that is supplied as one of the arguments to the procedure. The procedure
is simple and straightforward to use. Essentially, the user provides the
names of the model to be created, the mining function to be used, the
table containing the data to be used, the columns to be modeled, and the
settings table.The advantage of this approach is that
all the different algorithms are invoked in a similar way. The fine-tuning
of each algorithm is incorporated into settings tables, but in many cases
the various setting coefficients can be determined automatically by the
algorithms themselves. The complexity of entries in a settings table varies
depending on the algorithm and on the expertise of the user. Many expert
users may want to manually set all possible coefficients, while most of
us greatly appreciate the system’s ability to figure out the appropriate
settings automatically. Oracle provides a list of constants to be used
as settings keys, and values are either named constants or numeric intervals.
Table 1: Values for the algo_name (algorithm name) settings key
| Values |
Algorithm Description |
|
algo_adaptive_bayes_network
|
Adaptive Bayes network algorithm (classification)
|
|
algo_ai_mdl
|
Minimum description length algorithm (attribute importance)
|
|
algo_apriori_association_rules
|
Apriori algorithm (association)
|
|
algo_decision_tree
|
Decision tree algorithm (classification)
|
|
algo_kmeans
|
k-means algorithm (clustering)
|
|
algo_naive_bayes
|
Naive Bayes algorithm (classification)
|
|
algo_nonnegative_matrix_factor
|
Non-negative matrix factorization algorithm (feature selection)
|
|
algo_o_cluster
|
O-Cluster algorithm (clustering)
|
|
algo_support_vector_machines
|
Support vector machine algorithm (classification or regression)
|
Oracle’s constant values for the algorithm name (algo_name) key
are listed above. For each of these values, a different set of possible
keys and values is used, and so on. Below (Figure 2) it is shown how a
wizard function maps these keys into a tree structure and allows the user
to define the settings table by manipulating the settings tree.
As the mining model has been created in the Oracle database, the DBMS_DATA_MINING.APPLY
procedure is used to apply the model to new datasets. Again, this is a
simple-to-use procedure that requires as input only the name of the mining
model, the name of a table containing the new dataset, a column for identifying
rows in the new dataset, and the name of the resulting dataset. The Java
class OracleMiningModel (below) utilizes the APPLY procedures whenever
a prediction, score, or apply method is called. Additionally, the DBMS_DATA_MINING
package contains several functions that return details of each model based
on type, as result sets or in XML format. These detail functions are also
accessed by using instances of the OracleMiningModel class, which represent
different models in the database.
Packaging the (PL/SQL) settings concept in Java is achieved here by
creating a Java class called OracleModelSettings having flexible constructors
with various signatures, including
public OracleModelSettings ( String modelSettingsName,
Connection databaseConnection,
String[] keyToValueStringMap)
throws SQLException
The keyToValueStringMap is simply an array of strings of the form “<key>
-> <value>”. This array specifies the rows of the settings
table, and the class is responsible for maintaining the settings table
in the Oracle database.
Similarly, packaging the model concept in Java may be achieved by creating
a Java class called OracleMiningModel having constructors, again, with
various signatures, including
public OracleMiningModel ( String modelName,
OracleModelSettings oms,
String[] keyToValueMappings,
boolean recreate)
throws SQLException
Here the keyToValueMappings array is used to determine the algorithm
used and other named attributes required to create the data mining model
in the Oracle database. The purpose of the class is to create and maintain
the data mining model. The OracleMiningModel class additionally defines
methods that are used to retrieve and apply the model to new datasets.
These methods include the following, just to name a few.
public OracleResultSet infoAprioriAssociationRules(int topn)
public OracleResultSet infoAprioriFrequentItemsets(int topn)
public OracleResultSet infoAdaptiveBayesNetwork()
public OracleResultSet infoAIMinimumDescLength()
public OracleResultSet infoKMeans()
public OracleResultSet infoNaiveBayes()
public OracleResultSet infoNonnegativeMatrixFactorization()
public OracleResultSet infoOCluster()
public OracleResultSet infoSupportVectorMachines()
public XMLType infoDecisionTree()
public Object getPrediction(String[] signature, double[] doubleVal)
public HashMap score(String[] signature, double[] doubleVal)
public OracleResultSet apply( String dataTable,
String caseID,
String resultTable,
String schema,
boolean overwrite)
All the methods may throw SQL exceptions in addition to having the signature
above. Once the data mining functionality is manageable from these two
simple classes, the spreadsheet platform may be invoked to access any
of the available data mining algorithms in order to model datasets in
the Oracle database.
A Data Mining Example
So let’s look at a small data mining application written in this
system that accesses the Oracle database to create and run an ODM (Oracle
Data Mining) regression model. The purpose of this regression model is to
predict heart rate based on inputs such as blood pressure, height, and
weight. Using J Cells, one accesses the Java API directly to instantiate
objects and to invoke methods on the objects. The first thing to do is
to connect to the Oracle database. A DataSource object may be instantiated
by entering the formula
() = ~ OracleDataSource("agust","agust","dbVaio","vaioFS");
into cell b3 in the spreadsheet. Here the tilde symbol (~) indicates
a shorthand notation and allows J Cells to translate the (right hand)
statement into the constructor “new cell.OracleDataSource( "agust",
"agust", "dbVaio", "vaioFS");” giving
the system access to the database “dbVaio” on server “vaioFS”
as user “agust”.
Obtaining a database connection and inspecting the source data in the
database may now be achieved by invoking the proper methods on the DataSource
object, such as by entering the following formulas in cells b4 and b5,
respectively:
(*) = b3.getConnection();
(*) = b3.query("select * from pulse_clinical");
The first statement returns a java.sql.Connection object into cell b4,
and the second statement returns a java.sql.ResultSet object into cell
b5. The result set can be inspected by simply double-clicking on the cell
(b5), which opens up the resulting table in a table frame for viewing.
So far, I have just created a few simple data objects in the spreadsheet.
Now it is time to invoke the data mining API in order to define a settings
object and then to create a simple data mining model. First the settings
object is created by entering the following statement in cell b6:
(*) = new cell.odm.OracleModelSettings("xyz_settings", b4,
new String[]{
"algo_name -> algo_support_vector_machines",
"svms_kernel_function -> svms_linear"} );
Immediately I see a problem with this formula being end user–friendly,
so it may be a good idea to register a wizard with J Cells that generates
this formula automatically after prompting the user. It is typical for
a spreadsheet to assist users when creating complex formulas, so users
may expect to be guided when instantiating complex objects.
The deployed wizard is shown in Figure 2.

Figure 2: A typical wizard interface
Similarly, the data mining model is created by using a wizard or by
typing the formula; in either case, the resulting model is instantiated
in cell b7 specifying a call to the API directly:
(*)= new cell.odm.OracleMiningModel("xyz_model", b6,
new String[]{
"data_table_name -> pulse_clinical",
"mining_function -> regression",
"target_column_name -> pulse",
"case_id_column_name -> subject"},
false );
Instantiating the data mining model using this formula results in the creation
of a standard Oracle Data Mining model in the Oracle database. The model
can be inspected by double-clicking on cell b7, resulting in the full value
for the model being displayed as shown in Figure 3.

Figure 3: Inspecting the model in cell b7
Applying the model to datasets in Oracle Database is done by using the
OracleMiningModel methods described previously. As a simple and interactive
scoring of the model—typical for a spreadsheet application—the
user may want to enter blood pressure, height, and weight values and have
the database predict the heart rate using the model just defined. The
Java API method getPrediction defined on model objects is well suited
for this purpose. After having entered the input values in cells e4, e5,
e6, and e7 (in the order specified by the signature array in cell b8),
the scoring is done by entering the following formula:
(*) = b7.getPrediction(b8,new double[]{e4,e5,e6,e7});
Again, this formula accesses the Java API directly to obtain and display
the scoring result into cell e9 as shown in Figure 4.

Figure 4: Applying the model to input values
The Spreadsheet Advantage
It is somewhat baffling how popular the spreadsheet is with end users.
On the one hand, today’s most common spreadsheet systems have long
frustrated many developers, who are used to more-flexible and -powerful
systems. For the nondeveloper, on the other hand, there are some clear
advantages to using spreadsheet systems: not having to build graphical
user interfaces, building and testing each piece of formula (code) individually,
and hiding the formulas and seeing instead the much simpler results of
calculations. These advantages are only present in today’s popular
spreadsheets when working with formulas that return numbers or text into
cells, which is a major limitation with the systems. Here, I have demonstrated
how to remove this limitation and create a more powerful tool. It can
then be used to directly access Oracle’s data mining functionality
as well as other APIs.
Conclusion
By employing a more powerful spreadsheet, it is possible to significantly
reduce the time it requires to introduce new technology releases to end
users, such as the Java APIs and PL/SQL APIs published by Oracle. In fact,
using the ideas described here, it is often feasible to hand a raw Java
API directly to the nonprogrammer for immediate incorporation into decision-making
processes or for prediction and analysis.
Agust Egilsson (egilsson@hi.is) is a mathematics professor, an experienced Oracle database expert, and
an avid PL/SQL and Java developer.
|