|By Rick Hillegas, June 2009|
The next major release of Java DB, version 10.5.1.1, is now available for download. Java DB is Sun's supported distribution of the Apache Derby open-source database. Java DB is 100 percent Java technology and easy to use. Don't be fooled by its small JAR file (2.5M): This free database is fast, reliable, and packed with features. You can learn more about it on the Java DB web site.
The new 10.5.1.1 release of Java DB introduces performance boosts and some exciting features:
OFFSET/FETCH. Results can be chunked efficiently.
The rest of this article describes the new features: in-memory database, generated columns,
OFFSET/FETCH syntax, and SQL roles.
Version 10.5.1.1 introduces in-memory databases. Now you can run fast queries against complicated data that disappears when your JVM exits. In-memory databases are great for the following:
Creating an in-memory database is easy. Just add the
memory subprotocol to the JDBC connection URL. For instance, if your classpath contains the Derby JAR file (
derby.jar), here's how you create a JDBC connection to an in-memory database called
java.sql.Connection conn = java.sql.DriverManager.getConnection( "jdbc:derby: memory:patient;create=true" );
After that, you use the in-memory
patient database just like you would a disk-based database: Create tables, views, triggers, functions, and procedures. Insert, update, browse, and delete data. When your application exits, the database disappears, leaving no trace of itself on disk.
The in-memory database is an experimental feature in version 10.5.1.1. This means that the API may change when it is fully productized in the next major release. Try the new release and let the Java DB development team know how to improve it. The best way to reach the team is through the Derby community mailing list, specifically derby-user [at] db.apache.org.
Before turning to the other major features in version 10.5.1.1, let's set up Java DB for the examples that this article will show. These examples use a medical research application. In this application, data is collected on patients participating in a study. Because these are just examples, you take two shortcuts. First, you use a transient in-memory database so that you don't have to clean up when you are done.
Second, you use Java DB's
BUILTIN authentication scheme. The
BUILTIN authentication scheme is useful for writing tests like this, but you should not use this scheme in production. To run an application like this in production, use an on-disk database and plug in an LDAP-based authenticator as described in the
white paper on Java DB security.
To run these examples, download
version 10.5.1.1 of the Java DB engine and tools. Make sure that you wire the
derbytools.jar files into your classpath.
Start by creating a database that recognizes three users: Alice, Ruth, and an administrator:
java \ -Dderby.connection.requireAuthentication=true \ -Dderby.authentication.provider=BUILTIN \ -Dderby.user.admin=adminpassword \ -Dderby.user.alice=alicepassword \ -Dderby.user.ruth=ruthpassword \ -Dderby.database.sqlAuthorization=true \ -Dderby.stream.error.method=java.sql.DriverManager.getLogStream \ org.apache.derby.tools.ij
At this point, you are talking to the
ij program, Java DB's command-line tool for executing SQL statements. Now the administrator creates a database:
connect 'jdbc:derby:memory:patient;create=true;user=admin;password=adminpassword' as admin_conn;
Now that you have set up Java DB for this article's examples, let's go back to discussing the new 10.5.1.1 features.
Version 10.5.1.1 introduces generated columns. A
generated column is a column whose contents are computed from the values of other columns in the same row. This allows you to take expressions out of the
WHERE clause of a query, pre-compute those expressions, and index the pre-computed results so that your query runs faster. An example follows.
Remember that the administrator has just booted a database. Next, the administrator creates some tables to hold the patient data. One of the tables has a generated column, flagged here as the first line of bold type. Also note the index, flagged here as the second bold line:
create table privatePatientData ( patientID int primary key generated always as identity, name varchar( 32672 ) not null, address varchar( 32672 ) not null, telephoneNumber varchar( 100 ) not null ); create table publicPatientData ( patientID int primary key references privatePatientData( patientID ), postalCode varchar( 100 ), dateBorn date not null, heightInches double not null, weightPounds double not null, gender char(1), diseaseMetric int, heightWeightRatio generated always as ( (heightInches*heightInches) / weightPounds ) ); create index ppd_hwr_idx on publicPatientData( heightWeightRatio, diseaseMetric ); create index ppd_d_idx on publicPatientData( diseaseMetric, patientID );
Let's take a look at the generated column and the bold text that follows it, the generation clause. The generation clause consists of some keywords followed by an expression in parentheses. The expression itself is built up out of other columns in the row and can be complicated. For instance, the expression in parentheses can involve system and user-defined functions.
Whenever you insert a new row into the
publicPatientData table, Java DB automatically computes the contents of the generated
heightWeightRatio column based on the values that you have inserted into the
weightPounds columns. Whenever you update the
weightPounds columns, Java DB automatically recalculates the value of the generated column. The only way to change the contents of the generated column is to alter the value of a column that it depends on.
Remember the index
ppd_hwr_idx, flagged on the second bold line? That index, combined with the generated column, makes it easy for Java DB to calculate how patient health correlates with the patient's height-weight ratio. The following query can take advantage of the index and therefore run fast:
select diseaseMetric, count(*) outcome from publicPatientData where heightWeightRatio < 0.4 group by diseaseMetric;
Version 10.5.1.1 also introduces the
OFFSET/FETCH syntax, which was recently added to the ANSI/ISO SQL standard. This syntax lets you focus attention on a small page of results without having to read a table all the way to the end. Some other databases support this functionality through the nonstandard
Suppose that, during the course of the medical study, researchers want to notify patients who are at high risk as measured by their
diseaseMetric. Several researchers can share this notification job by using queries like the following statements. These queries divide the high-risk patients into groups, and each researcher takes responsibility for a group. Note the new syntax flagged in
-- Page 1 select diseaseMetric, pub.patientID, name, telephoneNumber from publicPatientData pub, privatePatientData priv where pub.patientID = priv.patientID and diseaseMetric > 10 order by diseaseMetric, pub.patientID offset 0 rows fetch next 20 rows only; -- Page 2 select diseaseMetric, pub.patientID, name, telephoneNumber from publicPatientData pub, privatePatientData priv where pub.patientID = priv.patientID and diseaseMetric > 10 order by diseaseMetric, pub.patientID offset 20 rows fetch next 20 rows only;
In the preceding queries, the
OFFSET clause tells Java DB to skip over some number of rows before it returns results to the user. The
FETCH NEXT clause tells Java DB to stop reading data after giving the user the desired number of rows.
Finally, version 10.5.1.1 introduces roles to help simplify the management of fine-grained SQL permissions. Roles let you define packets of privileges and then grant a whole packet to a user. This ability gives application designers and database administrators several benefits:
In the following example, the administrator creates two roles:
tech_writer. Users with the
researcher role can view and update all information. Users with the
tech_writer role cannot change the data and cannot view sensitive, personal information:
create role researcher; create role tech_writer; grant all privileges on privatePatientData to researcher; grant all privileges on publicPatientData to researcher; grant select on publicPatientData to tech_writer;
Next, the administrator identifies Alice as a researcher and Ruth as a tech writer:
grant researcher to alice; grant tech_writer to ruth;
Now Alice verifies that she can create and view data. Note the statement in bold type in the following code: Alice starts out explicitly stating that she is going to perform researcher tasks. This can be surprising at first. If an application relies on SQL roles, then its users have to explicitly state what job they are performing at any given time:
connect 'jdbc:derby:memory:patient;user=alice;password=alicepassword' as alice_conn; set role researcher; -- Alice can insert data into these tables insert into admin.privatePatientData ( name, address, telephoneNumber ) values ( 'Martha Murphy', '29 Primrose Lane/St. Paul, Minnesota', '123-456-7890' ) ; insert into admin.publicPatientData values ( ( select patientID from admin.privatePatientData where name = 'Martha Murphy' ), '55109', date('1955-02-23'), 66, 150, 'F', 7, default ); -- Alice can view all of the data select * from admin.privatePatientData; select * from admin.publicPatientData;
Similarly, Ruth verifies that she can read only public data and that she cannot change any data. In the following example, the comments flag several spots where you will see errors when you run this script through the
connect 'jdbc:derby:memory:patient;user=ruth;password=ruthpassword' as ruth_conn; -- Ruth is not allowed to set herself to a role not already granted to her. set role researcher; -- but she can set herself to this role set role tech_writer; -- she does not have permission to insert into the tables insert into admin.privatePatientData ( name, address, telephoneNumber ) values ( 'XXX', 'XXXX', 'XXX-XXX-XXXX' ) ; -- she can't read the private table select * from admin.privatePatientData; -- but she can read the public table select * from admin.publicPatientData;
That concludes the examples. Now shut down the
This concludes the overview of the main features of Java DB 10.5.1.1: in-memory database, generated columns,
OFFSET/FETCH, and SQL roles.