Java
Java DB
Abstract
This white paper explores table functions, a feature introduced in Java DB release 10.4.1.3.
Rich and expressive, SQL is the world's most popular database query language. Table functions let you take SQL outside its traditional home in the RDBMS, setting SQL loose on data in the wild.
Table of Contents
Writing Our Own Table Function 8
Quick Review 12
Wrapping 13
Extending EnumeratorTableFunction 13
Extending FlatFileTableFunction 13
Extending StringColumnTableFunction 14
Extending TemplateTableFunction 14
Declaring 15
Invoking 16
Views 17
Summary 18
Table functions are just what their name implies, viz., functions which return tabular data sets. These tabular data sets can then be queried just like ordinary tables, via the full, expressive power of SQL. For this reason, table functions are sometimes called virtual tables. The data returned by these special functions can come from anywhere:
files - files and web resources
collections - in-memory collections
foreign data - other databases, including non-relational sources
streams - transient information streams, including data feeds and device outputs
It's easy to write table functions. All you have to do is:
wrap – code a JDBC ResultSet to wrap your data
declare – register your ResultSet in Java DB
invoke – from then on treat your foreign data like a table in SQL queries
This white paper presents some sample table functions and shows you how to write your own.
The author would like to thank Michelle Caisse, Francois Orsini, and Dag Wanvik for their help in reviewing early drafts of this tutorial.
Suppose we wanted to import data directly from another database. We would start out by wrapping the foreign data in a public static method which returns a ResultSet. All of the hard work happens in the other database:
package com.acme.hrSchema;
import java.sql.*;
public class EmployeeTable
{
public static ResultSet read()
throws SQLException
{
Connection conn = DriverManager.getConnection
( "jdbc:mysql://localhost/hr?user=root&password=mysql-passwd" );
PreparedStatement ps = conn.prepareStatement( "select * from hrSchema.EmployeeTable" );
return ps.executeQuery();
}
}
Next, we would put the compiled class on Java DB's classpath and declare a Java DB table function (here done through Java DB's ij query tool):
connect 'jdbc:derby:myDatabase;create=true';
create function employeeTable()
returns table
(
employeeID int,
firstName varchar( 50 ),
lastName varchar( 50 ),
birthday date
)
language java
parameter style DERBY_JDBC_RESULT_SET
no sql
external name 'com.acme.hrSchema.EmployeeTable.read';
Finally, we would use the foreign data in Java DB queries. For instance, here's how we would invoke the function to siphon the foreign data into a local Java DB table:
connect 'jdbc:derby:myDatabase;create=true'; insert into employee_table select * from table( employeeTable() ) s;
Let's recap what just happened here:
wrap – First we created a public static method which returns a JDBC ResultSet. A table function is just a public static method which returns java.sql.ResultSet.
declare – Next, we told Java DB where the table function lived.
invoke – From then on, we were able to use the foreign data just as though it were a local Java DB table. We did this by invoking our function inside a table constructor in the FROM list of our query.
Now let's try some fancier SQL. We'll use a table function supplied in the tar file that accompanies this white paper. For more information about this table function, please see the accompanying javadoc. This simple table function presents a Java properties file as a table with two columns: id and text. We're going to suppose that your Java application follows the common convention of storing its user-visible messages in properties files consisting of text lines which have the form 'messageID=messageText' and that you have separate localized versions of these files for each language that your application supports. For more information on this coding convention, see the PropertyResourceBundle javadoc. We're going to run a table function against the message files in your product and ask the question: “What recently added English text still needs to be translated into Spanish?”
Because we're using a table function which has already been written, we can skip the wrap step. That is, the accompanying jar file already supplies a public static method which returns the ResultSet we need. We just have to make sure that our classpath contains that jar file. We declare the table function:
connect 'jdbc:derby:myDatabase;create=true';
create function properties( fileName varchar( 32672 ) )
returns table
(
id varchar( 50 ),
text varchar( 1000 )
)
language java
parameter style DERBY_JDBC_RESULT_SET
no sql
external name 'sun.javadb.vti.example.PropertiesTableFunction.properties';
Then we invoke the table function. In order to answer our question, we look for message ids which appear in the English message file but not the Spanish message file:
connect 'jdbc:derby:myDatabase;create=true'; select id from table( properties( '/MySourceCode/messages_en.properties' ) ) english where id not in ( select id from table( properties( '/MySourceCode/messages_es.properties' ) ) spanish );
Now let's try a grouped aggregate, one of the more powerful features of SQL. We'll ask the question: “What are the sizes of the packages in my jar file?” We're going to take advantage of the fact that a jar file is just a special kind of zip file. Again, we'll skip the wrap step because we will use a table function supplied in the accompanying tar file. This table function loops through the entries in a java.util.zip.ZipFile, treating each java.util.zip.ZipEntry as a separate row. First, we declare the table function:
connect 'jdbc:derby:myDatabase;create=true';
create function zipFile
( fileName varchar( 32672 ) )
returns table
(
name varchar( 100 ),
directory varchar( 200 ),
comment varchar( 1000 ),
compressed_size bigint,
crc bigint,
size bigint,
modification_time bigint
)
language java
parameter style DERBY_JDBC_RESULT_SET
no sql
external name 'sun.javadb.vti.example.ZipFileTableFunction.zipFile';
Then we invoke the table function, summing up the sizes of all files per directory. This answers our question:
connect 'jdbc:derby:myDatabase;create=true'; select directory, sum( compressed_size ) package_size from table ( zipFile( '/MyApplication/lib/product.jar' ) ) s group by directory order by package_size desc;
Here's the partial output of this query when we look at the derby.jar file supplied with Java DB release 10.4.1.3:
|
DIRECTORY |
PACKAGE_SIZE |
|
org/apache/derby/impl/sql/compile |
542718 |
|
org/apache/derby/impl/sql/execute |
260740 |
|
org/apache/derby/impl/store/raw/data |
133072 |
|
org/apache/derby/iapi/types |
114583 |
Writing Our Own Table Function
The last example used ZipFileTableFunction . That class extends EnumeratorTableFunction, another helper class in the accompanying tar file. EnumeratorTableFunction helps you present any Java collection as a table. Like all of the classes in the accompanying tar file, EnumeratorTableFunction is an implementation of ResultSet. You can extend EnumeratorTableFunction to make a table out of any of the following types of objects:
java.util.Enumeration
java.util.Iterator
java.lang.Iterable – including any kind of java.util.Collection.
arrays
EnumeratorTableFunction treats each Object in your collection as a row in the returned table. Your class which extends EnumeratorTableFunction just needs to supply the following:
constructor – The constructor declares the names of the columns in the returned ResultSet. The constructor is handed the Iterable that it will traverse.
entry point – This is the ResultSet-returning public static method which you register with Java DB. The entry point invokes the constructor and returns a new instance of your class.
row maker – The makeRow() method, which you implement, is handed an Object from the collection. This method turns the Object into an array of Strings, one for each exposed field in your Object. There should be one cell for each of the column names you supplied above. EnumeratorTableFunction's next() method will call your makeRow() method in order to turn an Object into the next row..
It is really quite simple. For instance, here's how easy it is to write a table function which wraps the collection of Locales supported by the VM. Note the call to setEnumeration() in the constructor: this is how we pass the collection of Locales to EnumeratorTableFunction's machinery. When the table function is invoked, EnumeratorTableFunction loops through the collection, calling our makeRow() method in order to turn each Object in the collection into a row:
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.Locale;
import sun.javadb.vti.core.EnumeratorTableFunction;
public class LocaleTableFunction extends EnumeratorTableFunction
{
public LocaleTableFunction() throws SQLException
{
super( new String[] { "country", "language", "variant", "country_code", "language_code", "variant_code" } );
setEnumeration( Locale.getAvailableLocales() );
}
public static ResultSet locales() throws SQLException { return new LocaleTableFunction(); }
public String[] makeRow( Object obj ) throws SQLException
{
int col = 0;
Locale locale = (Locale) obj;
String[] row = new String[ getColumnCount() ];
row[ col++ ] = locale.getDisplayCountry();
row[ col++ ] = locale.getDisplayLanguage();
row[ col++ ] = locale.getDisplayVariant();
row[ col++ ] = locale.getCountry();
row[ col++ ] = locale.getLanguage();
row[ col++ ] = locale.getVariant();
return row;
}
}
We declare this table function. The column names we declare here DON'T have to match the names in LocaleTableFunction's constructor. However, the number of column names should agree. Note also that Java DB will truncate the strings returned by LocaleTableFunction if they exceed the lengths declared here:
connect 'jdbc:derby:myDatabase;create=true';
create function locales()
returns table
(
country varchar( 50 ),
language varchar( 50 ),
variant varchar( 50 ),
country_code varchar( 2 ),
language_code varchar( 2 ),
variant_code varchar( 50 )
)
language java
parameter style DERBY_JDBC_RESULT_SET
no sql
external name 'LocaleTableFunction.locales';
Now we invoke the table function. We list the supported languages, counting how many variants each has:
connect 'jdbc:derby:myDatabase;create=true'; select language, count(*) as variant_count from table ( locales() ) s where country <> '' group by language order by count(*) desc;
|
LANGUAGE |
VARIANT_COUNT |
|
Spanish |
20 |
|
Arabic |
17 |
|
English |
11 |
|
French |
5 |
|
German |
4 |
|
Chinese |
4 |
Let's step back a moment and quickly review what we've learned so far:
A table function, like any other Java DB function, is just a public static method. What distinguishes table functions is their return values. Table functions return JDBC ResultSets rather than scalar values. These ResultSets wrap external data. The external data could come from anywhere—it's up to the developer who writes the ResultSet.
Also like other Java DB functions, a table function must be declared using the CREATE FUNCTION statement. The static method's class must be visible on the database classpath.
You invoke your table function in the FROM clause of SQL queries. You can SELECT from a table function as though it were a table.
The next three sections discuss wrapping, declaring, and invoking in greater detail.
You have very little work to do as long as you extend one of the classes supplied in the accompanying tar file. The more refined the supplied class, the less work you have to do. Here's an overview of that work, proceeding from most to least refined class.
Extending EnumeratorTableFunction
We have already seen the code for LocaleTableFunction, a class which extends EnumeratorTableFunction . Here's how you extend EnumeratorTableFunction directly:
constructor – Write a constructor which calls the superclass constructor, supplying column names. The constructor should call setEnumeration (), handing it a collection, i.e., an Enumeration, Iterator, Iterable, or array.
entry point – Write a public static method which calls the constructor and returns a new instance. This is the method which you will declare to Java DB later on.
row maker – Implement the abstract method makeRow() . This method turns an Object in the collection into an array of String fields, one for each column in the ResultSet.
ZipFileTableFunction , supplied with the accompanying tar file, is another example of a class which extends EnumeratorTableFunction .
Extending FlatFileTableFunction
This is a base class for table functions which read files of structured records. Here's how you extend FlatFileTableFunction :
constructor - Write a constructor which calls the superclass constructor, supplying column names.
entry point – Write a public static method which calls the constructor and returns an instance. This is the method which you will declare to Java DB later on.
row maker – Implement the abstract method parseRow() . Your parseRow() method can advance through the file a line at a time by calling FlatFileTableFunction 's readLine() method. Once a whole record has been gobbled up, your parseRow() method returns the record as an array of Strings, one cell for each column in the returned row.
Extending StringColumnTableFunction
The previous two helper classes are fairly refined. This means that you have very little work to do if you want to extend them. In particular, those two helper classes implement the ResultSet.next() and ResultSet .getXXX() methods. Those methods are the basic machinery for advancing to the next row in the external data set and then retrieving its columns. The next helper class we will describe, StringColumnTableFunction, is a superclass of the previous classes. StringColumnTableFunction still does a lot of work for you, but extending it involves a little more work than extending the previous classes did.
StringColumnTableFunction is a superclass of table functions which can represent their rows as String arrays. Its concrete table function subclasses, however, are not limited to String types. StringColumnTableFunction knows how to cast Strings to other datatypes and it implements the ResultSet getXXX() methods for all of the datatypes which Java DB supports. This in turn lets you declare columns as numeric, date/time, and binary types. For instance, take a look at the CREATE FUNCTION statement in the preceding “More Fancy SQL” section. That sample table function mixes String and integer column types. For examples of how to extend StringColumnTableFunction , see its concrete subclass PropertiesTableFunction . Here's how you extend StringColumnTableFunction :
constructor – Write a constructor which calls the superclass constructor, supplying column names.
entry point – Write a public static method which calls the constructor and returns a new instance. This is the method which you will declare to Java DB later on.
positioning – Implement the next() method, advancing the table function to the next row of external data.
column reading – Implement the getRawColumn() method. This method is handed a column index and returns the String value of the corresponding column in the current row.
cleanup – Implement the close() method, releasing resources.
Extending TemplateTableFunction
Finally, we describe the most abstract of the helper classes provided in the accompanying tar file. This class, TemplateTableFunction , provides stubs for almost all of the methods in the Java 6 version of the ResultSet interface. If you need to compile this class for an older version of Java (jdk1.4 or Java 5), just remove the stubs in the code block which begins with the comment “JAVA 6 COMPATIBLE BEHAVIOR”. To extend TemplateTableFunction , you must provide real implementations for the methods which Java DB will invoke as it loops through your table function. In most situations, this will only be a handful of methods. Here's how you extend TemplateTableFunction :
entry point – Write a public static method which returns a new instance. This is the method which you will declare to Java DB later on.
positioning – Implement the next() method, advancing the table function to the next row of external data.
accessors – Override the getXXX() stubs for the datatypes returned by your table function. For instance, if your table function declares character and integer columns, then you should override the getString() and getInt() stubs.
finish – Implement the close() method, releasing resources.
So far, we have explored the details of wrapping external data in a table function. Next, you need to make your table function visible to Java DB. Your table function class must appear on the database classpath. The simplest way to do this is to just include the class on the VM classpath. You can also store the class inside a jar file in the database. For details on how to do this, please consult the section titled “Loading classes from a database” in Java DB's Developer's Guide .
Next you must declare the table function via the CREATE FUNCTION statement. Let's revisit the example from the previous “More Fancy SQL” section. The parts in black are constant boilerplate included in all table function declarations. The colored parts vary depending on your table function:
create function
zipFile
(
fileName varchar( 32672 ) )
returns table
(
name varchar( 100 ),
directory varchar( 200 ),
comment varchar( 1000 ),
compressed_size bigint,
crc bigint,
size bigint,
modification_time bigint
)
language java
parameter style DERBY_JDBC_RESULT_SET
no sql
external name '
sun.javadb.vti.example.ZipFileTableFunction.zipFile'
Here's what the variable parts mean:
name – In this case zipfile . This is the name that you will use when you invoke the function in SQL queries.
arguments – In this case fileName varchar( 32672 ) . These are the names and Java DB types of the arguments to your public static method.
returned columns – These are the names and Java DB types of the columns in your ResultSet.
sql access – In this case no sql . For truly external data, this is always no sql . However, if your function reads data out of the session's Java DB database, then this should be reads sql data . If your function writes to the session's Java DB database, then this should be contains sql .
method name – In this case sun.javadb.vti.example.ZipFileTableFunction.zipFile . This is the package, class, and method name of your public static method which returns a ResultSet.
In the previous two sections, we explored how you wrap external data in a table function and then declare that function to Java DB. Java DB treats a declared function like other permanent objects such as tables and views. Now you can use your function in a query just like a table. You simply need to invoke your table function in the FROM clause of your query via a table constructor clause. Here again is the query from the previous “More Fancy SQL” section. The part in bold black is the constant table constructor boilerplate. It is simply the keyword table plus a set of parentheses which enclose the function call. The colored parts vary depending on your table function:
select
directory
, sum(
compressed_size
) package_size
from
table (
zipFile
('/MyApplication/lib/product.jar' )
)
s
group by
directory
order by package_size desc
Here's what the variable parts mean:
columns – In this case directory and compressed_size . These are the names of the returned columns which you defined when you declared the table function originally.
name – In this case zipFile . This is the name which you gave your table function when you declared it.
arguments – In this case ( '/MyApplication/lib/product.jar' ) . These are the argument values which are passed to your public static method.
correlation – In this case s . This is a short name for the table returned by your function. You use this, like other SQL correlation names, to clarify your meaning when the same column name appears in more than one table.
Sometimes you find that you are invoking the same table function with the same arguments repeatedly. In this situation, you can simplify your queries by defining helper views. For instance, the following helper views would have been useful in the previous “Fancier SQL” section:
connect 'jdbc:derby:myDatabase;create=true'; create view english as select * from table( properties( '/MySourceCode/messages_en.properties' ) ) s; create view spanish as select * from table( properties( '/MySourceCode/messages_es.properties' ) ) s;
Such views would let us express the query in that section compactly:
select id from english where id not in (select id from spanish);
In this whitepaper we explored table functions, a feature introduced in Java DB release 10.4.1.3. Table functions let you run expressive SQL queries against arbitrary data. It's easy to write table functions. All you have to do is:
wrap – Write a public static method which returns a JDBC ResultSet. This is particularly simple if the enclosing class extends one of the helper classes supplied in the accompanying jar file. The returned ResultSet loops through your external data, making rows out of records or objects.
declare – Use the CREATE FUNCTION statement to tell Java DB where your public static method lives.
invoke – From then on, query your external data just like a table or view in your Java DB database.
For more information on table functions, please see the section titled “Programming Derby-style table functions” in Java DB's Developer's Guide .
The accompanying tar file contains a number of classes which help you write your own table functions. Here is the class hierarchy inside that tar file. Yellow classes are abstract. Blue classes are concrete:
|
||||||||||||