FAQ - Berkeley DB Java Edition


updated: December 3, 2007

General
· Can Berkeley DB Java Edition use a NFS, SAN, or other remote/shared/network filesystem for an environment?
· Can a Berkeley DB database be used by Berkeley DB Java Edition?
· Does JE support high performance LOBs (large objects)?
· Does JE support key prefixing or key compression?
· How can I set JE configuration properties?
· How can insertion-ordered records or sequences be used?
· How do I add fields to an existing tuple format when using the Java bindings?
· How do I build a simple Servlet using Berkeley DB Java Edition?
· How do I verify that the configuration settings that I made in my je.properties file have taken effect?
· How does JE Concurrent Data Store (CDS) differ from JE Transactional Data Store (TDS)?
· Is a Berkeley DB database the same as a SQL "table"?
· Is it considered best practice to keep databases closed when not in use?
· What is the smallest cache size I can set with JE?
· Why don’t Berkeley DB and Berkeley DB Java Edition both implement a shared set of Java interfaces for the API? Why are these two similar APIs in different Java packages?

Installation and Build
· Does Berkeley DB Java Edition run within J2ME?
· Where does the je.jar file belong when loading within an application server?
· How should I set directory permissions on the JE environment directory?

Troubleshooting
· How do I debug a lock timeout?
· NIO issues in JDK 1.4.2_04 or earlier
· What is a safe way to stop threads in a JE application?

Transactions
· JE 2.0 has support for XA transactions in a J2EE app server environment. Can I use XA transactions (2 phase commit) in a non-J2EE environment?

Querying
· How can I perform wildcard queries or non-key queries?
· How can I join two or more primary databases or indexes?
· How can a join be combined with a range query?
· How do I perform a custom sort of secondary duplicates?
· What is the best way to access duplicate records when not using collections?
· What's the best way to get a count of either all objects in a database, or all objects that match a join criteria?
· How do I prevent "phantoms" when not using transactions?

Performance
· Which are better: Private vs Shared Database instances?
· Are there any locking configuration options?
· How can I estimate my application's optimal cache size?
· How can I tune JE's cache management policy for my application's access pattern?
· How do I begin tuning performance?
· In JE, what are the performance tradeoffs when storing to more than one database?
· Is a larger cache always better for JE?
· What are JE read buffers and when should I change their size?
· What are JE write buffers and when should I change their size?
· Why is my application performance slower with transactions?

Disk Space Utilization
· What is so different about JE log files?
· How can I check the disk space utilization of my log files?
· How can I find the location of checkpoints within the log files?

Java Collections
· Earlier versions of the Java collections API required that iterators be explicitly closed. How can they be used with other components that do not close iterators?
· How do I access duplicates using the Java collections API?
· In earlier versions of the Java collections API, why did iterators need to be explicitly closed by the caller?

Direct Persistence Layer (DPL)
· What is the complete definition of object persistence?
· How do I define primary keys, secondary keys and composite keys?
· How do I store and query data in an EntityStore?
· How are relationships defined and accessed?
· What is the difference between embedding a persistent object and a relationship with another entity object?
· Why must all persistent classes (superclasses, subclasses and embedded classes) be annotated?
· Why doesn't the DPL use the standard annotations defined by the Java Persistence API?
· How do I dump and load the data in an EntityStore?
· What is Carbonado and when should I use it instead of the Direct Persistence Layer (DPL)?


Can Berkeley DB Java Edition use a NFS, SAN, or other remote/shared/network filesystem for an environment?

There are two caveats with NFS based storage, although the motivation for them in Java Edition (JE) is different from that of Berkeley DB.

First, JE requires that the underlying storage system reliably persist data to the operating system level when write() is called and durably when fsync() is called. However, some remote file system server implementations will cache writes on the server side (as a performance optimization) and return to the client (in this case JE) before making the data durable. While this is not a problem when the environment directory's disk is local, this can present issues in a remote file system configuration because the protocols are generally stateless. The problem scenario can occur if (1) JE performs a write() call, (2) the server accepts the data but does not make it durable by writing it to the server's disk, (3) the server returns from the write() call to the client, and then (4) the server crashes. If the client (JE) does not know that the server has crashed (the protocol is stateless), and then JE later successfully calls write() on a piece of data later in the log file, it is possible for the JE log file to have holes in it, causing data corruption.

In JE 3.2.65 and later releases, a new parameter has been added, called je.log.useODSYNC, which causes the JE environment log files to be opened with the O_DSYNC flag. This flag causes all writes to be written durably to the disk. In the case of a remote file system it tells the server not to return from the write() call until the data has been made durable on the server's local disk. The flag should never be used in a local environment configuration since it incurs a performance penalty. Conversely, this flag should always be used in a remote file system configuration or data corruption may result.

When using JE in a remote file system configuration, the system should never be configured with multiple file system clients (i.e. multiple hosts accessing the file system server). In this configuration it is possible for client side caching to occur which will allow the log files to become out of sync on the clients (JE) and therefore corrupt. The only solution we know of for this is to open the environment log files with the O_DIRECT flag, but this is not available using the Java VM.

Second, Java Edition (JE) uses the file locking functionality provided through java.nio.channels.FileChannel.lock(). Java does not specify the underlying implementation, but presumably in many cases it is based on the flock() system call. Whether flock() works across NFS is platform dependent. A web search shows several bug reports about its behavior on Linux where flock() is purported to incorrectly return a positive status.

JE uses file locking for two reasons:

  1. To guarantee that only one writer process is attached to an environment, and that all other processes are read-only. (Note that JE supports multiple writer threads in a single process.)
  2. To guarantee that any file deletions that result from log cleaning are disabled as long as there is a reader process attached. This FAQ provides more information about log cleaning in the JE storage system.

Of course the simplest way of dealing with flock() vs NFS is to only use a single process to access a JE environment. If that is not possible, and if you cannot rely on flock() across NFS on your systems, you could handle (1) by taking responsibility in your application to ensure that there is a single writer process attached. Having two writer processes in a single environment could result in database corruption. (Note that the issue is with processes, and not threads.)

Handling the issue of log cleaning (2) in your application is also possible, but more cumbersome. To do so, you must disable the log cleaner (by setting the je.env.runCleaner property to false) whenever there are multiple processes accessing an Environment. If file deletion is not locked out properly, the reader processes might periodically see a com.sleepycat.je.log.LogFileNotFoundException, and would have to close and reopen to get a fresh snapshot. Such an exception might happen very sporadically, or might happen frequently enough to make the setup unworkable. To perform a log cleaning, the application should first ensure that all reader processes have closed the Environment (i.e. all read-only processes have closed all Environment handles). Once closed, the writer process should perform log cleaning by calling Environment.cleanLog() and Environment.checkpoint(). Following the completion of the checkpoint, the reader processes can re-open the environment.

Back to top


Can a Berkeley DB database be used by Berkeley DB Java Edition?

We've had a few questions about whether data files can be shared between Berkeley DB and Berkeley DB Java Edition. The answer is that the on disk format is different for the two products, and data files cannot be shared between the two. Both products do share the same format for the data dump and load utilities (com.sleepycat.je.util.DbDump, com.sleepycat.je.util.DbLoad), so you can import and export data between the two products.

Also, JE data files are platform independent, and can be moved from one machine to another.

Lastly, the two products both support the persistent Java Collections API and the a similar byte array based API. Currently, only Java Edition supports the POJO based Direct Persistence Layer API.

Back to top


Does JE support high performance LOBs (large objects)?

JE supports get() and put() operations with partial data. However, this feature is not fully optimized, since the entire record is always read or written to the database, and the entire record is cached.

So the only advantage (currently) to using partial get() and put() operations is that only a portion of the record is copied to or from the buffer supplied by the application. In the future we may provide optimizations of this feature, but until then we cannot claim that JE has high performance LOB support.

For more information on partial get() and put() operations please see our documentation.

Back to top


Does JE support key prefixing or key compression?

Key prefixing is a database storage technique that is useful for applications with large keys which share similar prefixes, in order to reduce the space used to store the keys. JE does not presently support key prefixing, but a possible workaround is to segment key values into different databases within a single environment to mimic the effect of key prefixing. Note that Berkeley DB (C) does support key prefixing.

JE also does not currently support key compression. While we have thought about it for both the DB and JE products, there are issues with respect to the algorithm that is used, the size of the key, and the actual values of the key. For example, LZW-style compression works well, but needs a lot of bytes to compress to be effective. If you're compressing individual keys, and they're relatively small, LZW-style compression is likely to make the key bigger, not smaller.

Back to top


How can I set JE configuration properties?

JE configurations can be programmatically specified through classes such as com.sleepycat.je.EnvironmentConfig, DatabaseConfig, StatsConfig, TransactionConfig, and CheckpointConfig. The application instantiates one of these configuration classes and sets the desired values.

There's a second configuration option, which is the je.properties file. Any configuration set though the get/set methods in the config classes can also be specified by creating a je.properties file in the environment home directory. Properties set through je.properties take precedence, and give the application the option of changing configurations without recompiling the application.

The complete set of JE properties are documented in the <jeHome>/example.properties. There are more properties available in example.properties than are exposed through the set/get methods of EnvironmentConfig. Some properties are experimental and may be phased out over time; others may be promoted as a set/get method of EnvironmentConfig, or default values will be modified as we get more feedback on what optimal settings are. All properties can also be set through EnvironmentConfig.setConfigParam().

Each property in the example.properties file comes with a description and is set to the default value, like this:

# If true (the default), use an LRU-only policy to select nodes for
# eviction. If false, select by Btree level first, and then by LRU.
# je.evictor.lruOnly=true
# (mutable at run time: false)

The last line indicates whether the property must be set before the environment is open, or whether it can be changed after the environment is open. Mutable properties can be changed after the environment open. For example, the je.evictor.lruOnly and je.evictor.nodesPerScan properties must be set before environment open, while je.maxMemory can be changed while the environment is open.

Back to top


How can insertion-ordered records or sequences be used?

The general capability for assigning IDs is a "sequence", and has the same functionality as a SQL SEQUENCE. The idea of a sequence is that it allocates values efficiently (without causing a performance bottleneck), and guarantees that the same value won't be used twice.

The class com.sleepycat.je.Sequence provides that functionality, and an example case is in <jeHome>/examples/je/SequenceExample.java.

Back to top


How do I add fields to an existing tuple format when using the Java bindings?

If you are currently storing objects using a TupleBinding, it is possible to add fields to the tuple without converting your existing databases and without creating a data incompatibility. Please note also that class evolution is supported without any application level coding through the Direct Persistence Layer API.

This excerpt from the Javadoc may be made to tuple bindings.

Collections Overview

The tuple binding uses less space and executes faster than the serial binding. But once a tuple is written to a database, the order of fields in the tuple may not be changed and fields may not be deleted. The only type evolution allowed is the addition of fields at the end of the tuple, and this must be explicitly supported by the custom binding implementation.

Specifically, if your type changes are limited to adding new fields then you can use the TupleInput.available() method to check whether more fields are available for reading. The available() method is the implementation of java.io.InputStream.available(). It returns the number of bytes remaining to be read. If the return value is greater than zero, then there is at least one more field to be read.

When you add a field to your database record definition, in your TupleBinding.objectToEntry method you should unconditionally write all fields including the additional field.

In your TupleBinding.entryToObject method you should call available() after reading all the original fixed fields. If it returns a value greater than zero, you know that the record contains the new field and you can read it. If it returns zero, the record does not contain the new field.

For example:


public Object entryToObject(TupleInput input) {
// Read all original fields first, unconditionally.
if (input.available() > 0) {x
// Read additional field #1
}
if (input.available() > 0) {
// Read additional field #2
}
// etc
}

Back to top


How do I build a simple Servlet using Berkeley DB Java Edition?

Below is a simple Servlet example that uses JE. It opens a JE Environment in the init method and then reads all the data out of it in the doGet() method.

import java.io.*;
import java.text.*;
import java.util.*;
import javax.servlet.*;
import javax.servlet.http.*;

import com.sleepycat.je.Cursor;
import com.sleepycat.je.Database;
import com.sleepycat.je.DatabaseConfig;
import com.sleepycat.je.DatabaseEntry;
import com.sleepycat.je.DatabaseException;
import com.sleepycat.je.Environment;
import com.sleepycat.je.EnvironmentConfig;
import com.sleepycat.je.LockMode;
import com.sleepycat.je.OperationStatus;

/**
* The simplest possible servlet.
*/

public class HelloWorldExample extends HttpServlet {
private Environment env = null;
private Database db = null;

public void init(ServletConfig config)
throws ServletException {
super.init(config);
try {
openEnv("c:/temp");
} catch (DatabaseException DBE) {
DBE.printStackTrace(System.out);
throw new UnavailableException(this, DBE.toString());
}
}

public void doGet(HttpServletRequest request,
HttpServletResponse response)
throws IOException, ServletException
{
ResourceBundle rb =
ResourceBundle.getBundle("LocalStrings",request.getLocale());
response.setContentType("text/html");
PrintWriter out = response.getWriter();

out.println("<html>");
out.println("<head>");

String title = rb.getString("helloworld.title");

out.println("<title>" + title + "</title>");
out.println("</head>");
out.println("<body bgcolor=\"white\">");
out.println("<a href=\"../helloworld.html\">");
out.println("<img src=\"../images/code.gif\" height=24 " +
"width=24 align=right border=0 alt=\"view code\"></a>");
out.println("<a href=\"../index.html\">");
out.println("<img src=\"../images/return.gif\" height=24 " +
"width=24 align=right border=0 alt=\"return\"></a>");
out.println("<h1>" + title + "</h1>");
dumpData(out);
out.println("</body>");
out.println("</html>");
}

public void destroy() {
closeEnv();
}

private void dumpData(PrintWriter out) {
try {
long startTime = System.currentTimeMillis();
out.println("<pre>");
Cursor cursor = db.openCursor(null, null);
try {
DatabaseEntry key = new DatabaseEntry();
DatabaseEntry data = new DatabaseEntry();
while (cursor.getNext(key, data, LockMode.DEFAULT) ==
OperationStatus.SUCCESS) {
out.println(new String(key.getData()) + "/" +
new String(data.getData()));
}
} finally {
cursor.close();
}
long endTime = System.currentTimeMillis();
out.println("Time: " + (endTime - startTime));

out.println("</pre>");
} catch (DatabaseException DBE) {
out.println("Caught exception: ");
DBE.printStackTrace(out);
}
}

private void openEnv(String envHome)
throws DatabaseException {

EnvironmentConfig envConf = new EnvironmentConfig();
env = new Environment(new File(envHome), envConf);

DatabaseConfig dbConfig = new DatabaseConfig();
dbConfig.setReadOnly(true);
db = env.openDatabase(null, "testdb", dbConfig);
}

private void closeEnv() {
try {
db.close();
env.close();
} catch (DatabaseException DBE) {
}
}
}

Back to top


How do I verify that the configuration settings that I made in my je.properties file have taken effect?

You can use the Environment.getConfig() API to retrieve configuration information after the Environment has been created. For example:


import java.io.File;
import com.sleepycat.je.*;
public class GetParams {
static public void main(String argv[])
throws Exception {
EnvironmentConfig envConfig = new EnvironmentConfig();
envConfig.setTransactional(true);
envConfig.setAllowCreate(true);
Environment env = new Environment(new File("/temp"), envConfig);
EnvironmentConfig newConfig = env.getConfig();
System.out.println(newConfig.getCacheSize());
env.close();
}
}
will display

> java GetParams
7331512
>
Note that you have to call getConfig(), rather than query the EnvironmentConfig that was used to create the Environment.

Back to top


How does JE Concurrent Data Store (CDS) differ from JE Transactional Data Store (TDS)?

Berkeley DB, Java Edition comes in two flavors, Concurrent Data Store (CDS) and Transactional Data Store (TDS). The difference between the two products lies in whether you use transactions or not. Literally speaking, you are using TDS if you call the public API method, EnvironmentConfig.setTransactional(true).

Both products support multiple concurrent reader and writer threads, and both create durable, recoverable databases. We're using "durability" in the database sense, which means that the data is persisted to disk and will reappear if the application comes back up after a crash. What transactions provide is the ability to group multiple operations into a single, atomic element, the ability to undo operations, and control over the granularity of durability.

For example, suppose your application has a two databases, Person and Company. To insert new data, your application issues two operations, one to insert into Person, and another to insert into Company. You need transactions if your application would like to group those operations together so that the inserts only take effect if both operations are successful.

Note that an additional issue is whether you use secondary indices in JE. Suppose you have a secondary index on the address field in Person. Although it only takes one method call into JE to update both the Person database and its secondary index Address, the application needs to use transactions to make the update atomic. Otherwise, it's possible that if the system crashed at given point, Person could be updated but not Address.

Transactions also let you explicitly undo a set of operations by calling Transaction.abort(). Without transactions, all modifications are final after they return from the API call.

Lastly, transactions give you finer grain durability. After calling Transaction.commit, the modification is guaranteed to be durable and recoverable. In CDS, without transactions, the database is guaranteed to be durable and recoverable back to the last Environment.sync() call, which can be an expensive operation.

Note that there are different flavors of Transaction.commit that let you trade off levels of durability and performance, explained in this FAQ entry.

So in summary, choose CDS when:

  • Transactional data protection is not required.
  • The application does not need a guarantee that secondary indices are consistent with primary indices.
  • The application does not need fine grained durability.

Choose TDS when:

  • Full transactional semantics, including the ability to transactionally protect groups of operations, is a requirement.
  • Recovery of committed data is a requirement.
  • Secondary indices are used and must be guaranteed to be in sync with primary indices.

There is a single download and jar file for both products. Which one you use is a licensing issue, and has no installation impact.

Back to top


Is a Berkeley DB database the same as a SQL "table"?

Yes; "tables" are databases, "rows" are key/data pairs, and "columns" are application-encapsulated fields. The application must provide its own methods for accessing a specific field, or "column" within the data value.

Back to top


Is it considered best practice to keep databases closed when not in use?

The memory overhead for keeping a database open is quite small. In general, it is expected that applications will keep databases open for as long as the environment is open. The exception may be an application that has a very large number of databases and only needs to access a small subset of them at any one time.

If you notice that your application is short on memory because you have too many databases open, then consider only opening those you are using at any one time. Pooling open database handles could be considered at that point, if the overhead of opening databases each time they are used has a noticeable performance impact.

Back to top


What is so different about JE log files?

JE log files are not like the log files of other database systems. Nor are they like the log files or database files of Berkeley DB Core (C) Edition.

  1. JE log files are "append only". Record insertions, deletions, and updates are always added at the end of the current file. The first file is named "00000000.jdb". When that file grows to a certain size (10 MB by default) a new file named "00000001.jdb" becomes the current file, and so on.
  2. There are no separate database files. Unlike Berkeley DB C Edition, databases are not stored in files that are separate from the transaction log. The transaction log and the database records are stored together in a single sequential log consisting of multiple log files.
  3. The JE Cleaner is responsible for reclaiming unused disk space. When the records in a log file are superseded by deletions or updates recorded in a later log file, the older log file is no longer fully utilized. The Cleaner, which operates by default as a separate thread, determines the least utilized log files, copies any still utilized records in those files to the end of the current log file, and finally deletes the now completely un-utilized log file.
  4. Cleaning does not start immediately and never produces 100% utilization. Until you have written enough data to create several log files, and some of that data is obsoleted through deletions and updates, you won't notice any log files being deleted by the cleaner. By default cleaning occurs in background and maintains the log files at 50% utilization. You can configure a higher utilization value, but configuring too high a utilization value will reduce overall performance.
  5. Cleaning is not automatically performed when closing the environment. If you wish to reduce unused disk space to a minimum at a particular point in time, you must explicitly call a method to perform log cleaning. See the Environment.cleanLog method for more information.
  6. Log file deletion only occurs after a checkpoint. The Cleaner prepares log files to be deleted, but file deletion must be performed after a checkpoint to ensure that the files are no longer referenced. Checkpoints occur on their own schedule, which is every 20 MB of log written, by default. This is part of the reason that you won't see log files being deleted until after several files have been created.
  7. Read-only processes which access an environment will deter log file cleaning, since JE needs to ensure that the read-only process sees a snapshot of the data consistent with the point in time when the environment was opened.
  8. Long running transactions can also deter log file cleaning, since JE needs to ensure that enough data is preserved so that the database can return to the pre-transaction state in the event of a transaction abort. Log cleaning is not permitted within the span of log which contains active transactions.

You'll find more about log files in the Getting Started Guide.

Back to top


What is the smallest cache size I can set with JE?

The smallest cache size is 96KB (96 * 1024). You can set this by either calling EnvironmentConfig.setCacheSize(96 * 1024) on the EnvironmentConfig instance that you use to create your environment, or by setting the je.maxMemory property in your je.properties file.

Back to top


Why don’t Berkeley DB and Berkeley DB Java Edition both implement a shared set of Java interfaces for the API? Why are these two similar APIs in different Java packages?

In the past, we've discussed whether it makes sense to provide a set of interfaces that are implemented by the Berkeley DB JE API and the Java API for Berkeley DB. We looked into this during the design of JE and decided against it because in general it would complicate things for "ordinary" users of both JE and DB.

  • There are architectural differences between the two products that require that applications are cognizant of which product is being used. For example, in DB database names consist of a (filename, database name) pair; since JE doesn't have database files, only a database name is required.
  • The constants classes like LockMode and OperationStatus could be faked in a common package, but there is no guarantee that they will be the same between products and releases.
  • Classes that applications construct directly (like DatabaseEntry) are also problematic: we could have common interfaces and a factory in a common package, but that doesn't allow for subclassing and presents problems for callbacks like SecondaryKeyCreate.
  • The exception classes are more problematic: even if we moved DatabaseException into the common package (breaking applications in the process), applications using the common interfaces would need to explicitly catch exceptions from both packages. Otherwise, we would need to unify what exceptions are thrown from DB and JE, and given that DB exceptions are generated based on C error codes, there is no way we would ever get that right.

Back to top


Does Berkeley DB Java Edition run within J2ME?

JE requires Java SE 1.4.2 or later. There are no plans to support J2ME at this time.

Back to top


Where does the je.jar file belong when loading within an application server?

It is important that je.jar and your application jar files—in particular the classes that are being serialized by SerialBinding—are loaded under the same class loader. For running in a servlet, this typically means that you would place je.jar and your application jars in the same directory.

Additionally, it is important to not place je.jar in the extensions directory for your JVM. Instead place je.jar in the same location as your application jars. The extensions directory is reserved for privileged library code.

One user with a WebSphere Studio (WSAD) application had a classloading problem because the je.jar was in both the WEB-INF/lib and the ear project. Removing the je.jar from the ear project resolved the problem.

Back to top


How do I debug a lock timeout?

The common cause of a com.sleepycat.je.DeadlockException is the situation where 2 or more transactions are deadlocked because they're waiting on locks that the other holds. For example:

  • transaction 1 has a lock on record A, and wants a (exclusive) write lock on record B.
  • transaction 2 has a lock on record B, and wants a (exclusive) write lock on record A.
The lock timeout message may give you insight into the nature of the contention. Besides the default timeout message, which lists the contending lockers, their transactions, and other waiters, it's also possible to enable tracing that will display the stacktraces of where locks were acquired.

Stacktraces can be added to a deadlock message by setting the je.txn.deadlockStackTrace property through your je.properties file or EnvironmentConfig. This should only be set during debugging because of the added memory and processing cost.

Enabling stacktraces gives you more information about the target of contention, but it may be necessary to also examine what locks the offending transactions hold. That can be done through your application's knowledge of current activity, or by setting the je.txn.dumpLocks property. Setting je.txn.dumpLocks will make the deadlock exception message include a dump of the entire lock table, for debugging. The output of the entire lock table can be large, but is useful for determining the locking relationships between records.

Another note, which doesn't impact the deadlock itself, is that the default setting for lock timeouts (specified by je.lock.timeout or EnvironmentConfig.setLockTimeout() can be too long for some applications with contention, and throughput improves when this value is decreased. However, this issue only affects performance, not true deadlocks.

Back to top


NIO issues in JDK 1.4.2_04 or earlier

In Sun's JDK 1.4.2_04 and earlier there were some problems around garbage collecting for direct buffers. This bug is described in this Sun bug report and was fixed in 1.4.2_05 and later. In JE versions before 2.0.83, java.nio was used by default and occasionally users would report this stack trace:


java.lang.OutOfMemoryError
at java.nio.Bits.reserveMemory(Bits.java:618)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:95)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:285)
at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:54)
at sun.nio.ch.IOUtil.read(IOUtil.java:205)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:591)
at com.sleepycat.je.log.FileReader.fillReadBuffer(FileReader.java:645)
...

Some JE users were able to fix the problem by moving to a later JVM patch release.

As of JE 2.0.83, JE no longer uses java.nio by default. NIO can be enabled by setting je.log.useNIO to true, (the default is false), but in most cases does not seem to provide much of a performance advantage.

Back to top


What is a safe way to stop threads in a JE application?

Calling Thread.interrupt() is not recommended for an active JE thread if the goal is to stop the thread or do thread coordination. If you interrupt a thread which is executing a JE operation, the state of the database will be undefined. That's because JE might have been in the middle of I/O activity when the operation was aborted midstream, and it becomes very difficult to detect and handle all possible outcomes.

If JE can detect the interrupt, it will mark the environment as unusable and will throw a RunRecoveryException. This tells you that you must close the environment and re-open it before using it again. If JE doesn't throw RunRecoveryException, it is very likely that you would get some other exception that is less meaningful, or simply see corrupted data.

Instead, applications should use other mechanisms like Object.notify() and wait() to coordinate threads. For example, use a "keepRunning" variable of some kind in each thread. Check this variable in your threads, and return from the thread when it is false. Set it to false when you want to stop the thread. If this thread is waiting to be woken up to do another unit of work, use Object.notify to wake up it. This is the recommended technique.

If you absolutely must interrupt threads for some reason, you should expect that you will see RunRecoveryException. Each thread should treat this exception as an indication that it should stop.

Back to top


JE 2.0 has support for XA transactions in a J2EE app server environment. Can I use XA transactions (2 phase commit) in a non-J2EE environment?

Yes. The com.sleepycat.je.XAEnvironment class implements the javax.transaction.xa.XAResource interface, which can be used to perform 2 phase commit transactions. The relevant methods in this interface are start(), end(), prepare(), commit(), and rollback(). The XAEnvironment.setXATransaction() is an internal entrypoint that is only public for the unit tests.

The XA Specification has the concept of implicit transactions (a transaction that is associated with a thread and does not have to be passed to the JE API); this is supported in JE 2.0. You can use the XAResource.start() method to create a JE transaction and join it to the calling thread. To disassociate a transaction from a thread, use the end() method. When you use thread-implied transactions, you do not have to pass in a Transaction argument to the JE API (e.g. through methods such as get() and put()). Instead, passing null in a thread-implied transaction environment tells JE to use the implied transaction. Note that there's a minor bug with JE Collections and thread-implied transactions that will be fixed in the next point release.

Here's a small example of how to use XAEnvironment and 2 Phase Commit:


XAEnvironment env = new XAEnvironment(home, null); Xid xid = [something...];
env.start(xid, 0); // creates a thread-implied transaction for you
... calls to get/put, etc. with null transaction arg will use the
implicit transaction...
env.end(xid, 0); // disassociate this thread from the implied
transaction
env.prepare(xid);
if (commit) {
env.commit(xid, false);
} else {
env.rollback(xid);
}

Your application must create an Xid class that implements javax.transaction.xa.Xid in order to create a transaction identifier.

Back to top


How can a join be combined with a range query?

Imagine an application where a single primary employee database has three fields that are indexed by secondary databases: status, department, salary. The user wishes to query for a specific status, a specific department, and range of salary values. Berkeley DB supports joins, and the join API can be used to select the AND (intersection) of a specific status and a specific department. However, the join API cannot be use to select a range of salaries. Berkeley DB also supports range searches, making it possible to iterate over a range of values using a secondary index such as a salary index. However, there is no way to automatically combine a range search and a join.

To combine a range search and a join you'll need to first perform one of the two using a Berkeley DB API, and then perform the other manually as a "filter" on the results of the first. So you have two choices:

  1. Perform the range query using the Berkeley DB API. Iterate through the range, and manually select (filter) the records that meet your join qualifications.
  2. Perform the join using the Berkeley DB API. Iterate through the results of the join, and manually select the records that meet your range search qualifications.

Which option performs best depends on whether the join or the range query will produce a smaller result set, on average. If the join produces a smaller result set, use option 2; otherwise, use option 1. There is a 3rd option to consider if this particular query is performance critical. You could create a secondary index on preassigned ranges of salary For example, assign the secondary key 1 for salaries between $10,000 and $19,999, key 2 for salaries between $20,000 and $29,999, etc.

If a query specifies only one such salary range, you can perform a join using all three of your secondary indices, with no filtering after the join. If the query spans ranges, you'll have to do multiple joins and then union the results. If the query specifies a partial range, you'll have to filter out the non-matching results. This may be quite complex, but it can be done if necessary. Before performing any such optimization, be sure to measure performance of your queries to make sure the optimization is worthwhile.

If you can limit the specified ranges to only those that you've predefined, that will actually simplify things rather than make them more complex, and will perform very well also. In this case, you can always perform a single join with no filtering. Whether this is practical depends on whether you can constrain the queries to use predefined ranges.

On range searches in general, they can be done with Cursor.getSearchKeyRange or with the SortedSet.subSet and SortedMap.subMap methods, depending on whether you are using the base API or the Collections API. It is up to you which to use.

If you use Cursor.getSearchKeyRange you'll need to call getNext to iterate through the results. You'll have to watch for the end range yourself by checking the key returned by getNext. This API does not have a way to enforce range end values automatically.

If you use the Collections API you can call subMap or subSet and get an Iterator on the resulting collection. That iterator will enforce both the beginning and the end of the range automatically.

Back to top


How do I perform a custom sort of secondary duplicates?

If you have a secondary database with sorted duplicates configured, you may wish to sort the duplicates according to some other field in the primary record. Let's say your secondary key is F1 and you have another field in your primary record, F2, that you wish to use for ordering duplicates. You would like to use F1 as your secondary key, with duplicates ordered by F2.

In Berkeley DB, the "data" for a secondary database is the primary key. When duplicates are allowed in a secondary, the duplicate comparison function simply compares those primary key values. Therefore, a duplicate comparison function cannot be used to sort by F2, since the primary record is not available to the comparison function.

The purpose of key and duplicate comparison functions in Berkeley DB is to allow sorting values in some way other than simple byte-by-byte comparison. In general it is not intended to provide a way to order keys or duplicates using record data that is not present in the key or duplicate entry. Note that the comparison functions are called very often—whenever any Btree operation is performed—so it is important that the comparison be fast.

There are two ways you can accomplish sorting by F2:

  1. Instead of using F1 as the secondary key, use a concatenated key F1+F2 as the secondary key. When you wish to do a lookup by F1, use arrange search (Cursor.getSearchKeyRange).
  2. Use F1 as the secondary key, as you are already doing. When you query the duplicates for F1, sort them manually by F2 after you query them. Since when you query the secondary you will have the primary record in hand, F2 will be available for sorting.

Option #1 has the advantage of automatically sorting by F2. However, you will never be able to do a join (via the Database.join method) on the F1 key alone. You will be able to do a join on the F1+F2 value, but it seems unlikely that will be useful.

Secondaries are often used for joins. Therefore, we recommend option #2 unless you are quite sure that you won't need to do a join on F1.

The trade-offs are:

  • Option #1 does not allow performing join on F1.
  • Option #1 has larger secondary keys (more overhead).
  • Option #2 requires programming the sort by F2 manually.
  • Option #2 requires enough memory to sort all duplicates for a given key F1.

Back to top


What is the best way to access duplicate records when not using collections?

In general, to access duplicate records (that is, records that are in a single database and have the same key), you need to position a cursor at the desired key, and then retrieve all the subsequent duplicate records.

The Getting Started Guide has a good section on how to position your cursor: Search For Records and then how to retrieve the rest of the duplicates: Working with Duplicate Records.

Back to top


What's the best way to get a count of either all objects in a database, or all objects that match a join criteria?

As with most btree based data stores, Berkeley DB Java Edition does not store record counts for non-duplicate records, so some form of internal or application based traversal is required to get the size of the result set. This is in general true of relational databases too; it's just that the count is done for you internally when the SQL count statement is executed. Berkeley DB Java Edition version 3.1.0 introduced a Database.count() method, which returns the number of all key/data pairs in the database. This method does an optimized, internal traversal, does not impact the working set in the cache, but may not be accurate in the face of concurrent modifications in the database.

To get a transactionally current count, or to count the result of a join, do this:

cursor = db.openCursor(...) OR db.join(someCursors);

count = 0;
while(cursor.getNext(...) == OperationStatus.SUCCESS) {
count++;
}

There are a few ways to optimize an application-implemented count:

  1. Counts are stored for duplicate record sets. One can optimize counts for databases with duplicate records by using Cursor.count() which returns the number of records that share the current record's key value. Suppose you want to count all the records in a database that supports duplicates and contains 3000 records, but only 3 distinct keys. In that case, it would be far more efficient to do:
    count = 0;
    while (cursor.getNextNoDup(...) == OperationStatus.SUCCESS){
    count += cursor.count();
    }
    because you will only look up 3 records (one for each key value), not 3000 records.
  2. If your database has large records, another option when counting is to set DatabaseEntry.setPartial (0, 0, true) on the key and data DatabaseEntry to reduce the overhead of returning large records.
  3. If you do not need a transactional count, using cursor.getNext( ...., LockMode.DIRTY_READ) will be faster.
  4. If you are counting the entire database and Database.count() is not supported in your JE version, you can also use the value of Btree.getLeafNodeCount(), obtained from Database.getStats() under certain circumstances. This returns a valid count of the number of records in the database, but because it is obtained without locks or transactions the count is only correct when the database is quiescent. In addition, although stats generation takes advantage of some internal code paths, it may consume more memory when analyzing large databases.

Back to top


Which are better: Private vs Shared Database instances?

Using a single Database instance for multiple threads is supported, but turns out to present a minor bottleneck. The issue is that the Database object maintains a set of Cursors open against it. This set is used to check if all Cursors are closed against the Database when close() is called, but to do that JE has to synchronize against it before updating it. So if multiple threads are sharing the same Database handle it makes for a synchronization bottleneck. In a multi-threaded case, unless there's a good reason to share a Database handle, it's probably better to use separate handles for each thread.

Back to top


Are there any locking configuration options?

JE 2.1.30 introduced two new performance motivated locking options.

No-locking mode is on one end of the spectrum. When EnvironmentConfig.setLocking(false) is specified, all locking is disabled, which relieves the application of locking overhead. No-locking should be used with care. It's only valid in a non-transactional environment and the application must ensure that there is no concurrent activity on the database. Concurrent activity while in no-locking mode can lead to database corruption. In addition, log cleaning is disabled in no-locking mode, so the application is responsible for managing log cleaning through explicit calls to the Environment.cleanLog() method.

On the other end of the spectrum is the je.lock.nLockTables property, which can specify the number of lock tables. While the default is 1, increasing this number can improve multithreaded concurrency. The value of this property should be a prime number, and should ideally be the nearest prime that is not greater than the number of concurrent threads.

Back to top


How can I estimate my application's optimal cache size?

In JE 2.1.30 we included an undocumented utility, com.sleepycat.je.util.DbCacheSize. This utility takes record size and number of records for a given database and provides an estimate of its in-memory footprint.

DbCacheSize is undocumented because it doesn't yet support the use of duplicate records in a database, and because we may be changing the API from release to release. But even so, it can be a useful aid for general estimates.

See the header comments in <jeHome>/com/sleepycat/je/util/DbCacheSize.java for the full explanation of the utility's parameters, and how to interpret the results.

Back to top


How can I tune JE's cache management policy for my application's access pattern?

JE, like most databases, performs best when database objects are found in its cache. The cache eviction algorithm is the way in which JE decides to remove objects from the cache and can be a useful policy to tune. The default cache eviction policy is LRU (least recently used) based. Database objects that are accessed most recently are kept within cache, while older database objects are evicted when the cache is full. LRU suits applications where the working set can stay in cache and/or there are some data records are used more frequently than others.

An alternative cache eviction policy was added in JE 2.0.83 that is instead primarily based on the level of the node in the Btree. This level based algorithm can improve performance for some applications with both of the following characteristics:

  1. Access by key is mostly random. In other words, there are few or no "hot" sets of records that are accessed more than others, and most record access is not in sequential key order.
  2. The memory size of the active record set is significantly larger than the configured JE cache size, causing lots of I/O as records are fetched from disk.
The alternative cache eviction policy is specified by setting two configuration parameters to your je.properties file or EnvironmentConfig object:
je.evictor.lruOnly=false
je.evictor.nodesPerScan=100
The level based algorithm works by evicting the lowest level nodes of the btree first, even if higher level nodes are less recently used. In addition, dirty nodes are evicted after non-dirty nodes. This algorithm can benefit random access applications because it keeps higher level Btree nodes in the tree for as long as possible, which for a random key, can increase the likelihood that the relevant Btree internal nodes will be in the cache.

We recommend that you also change the nodesPerScan property when you set lruOnly to false. This setting controls the number of Btree nodes that are considered, or sampled, each time a node is evicted. We have found in our tests that a setting of 100 produces good results. The larger the nodesPerScan, the more accurate the algorithm. However, don't set it too high. When considering larger numbers of nodes for each eviction, the evictor may delay the completion of a given database operation, which impacts the response time of the application thread.

Back to top


How do I begin tuning performance?

Gathering environment statistics is a useful first step to doing JE performance tuning. Execute the following code snippet periodically to display statistics for the past period and and to reset statistics counters for the next display.

StatsConfig config = new StatsConfig();
config.setClear(true);

System.err.println(env.getStats(config));

The Javadoc for com.sleepycat.je.EnvironmentStats describes each field. Cache behavior can have a major effect on performance, and nCacheMiss is an indicator of how hot the cache is. You may want to adjust the cache size, data access pattern, or cache eviction policy and monitor nCacheMiss.

Applications which use transactions may want to check nFSyncs to see how many of these costly system calls have been issued. Experimenting with other flavors of commit durability, like TxnWriteNoSync and TxnNoSync can improve performance.

nCleanerRuns and cleanerBacklog are indicators of log cleaning activity. Adjusting the property je.cleaner.minUtilization can increase or decrease log cleaning. The user may also elect to do batch log cleaning, as described in the Javadoc for Environment.cleanLog(), to control when log cleaning occurs.

High values for nRepeatFaultReads and nRepeatIteratorReads may indicate non-optimal read buffer sizes. See the FAQ entry on configuring read buffers.

Back to top


In JE, what are the performance tradeoffs when storing to more than one database?

A user posted a question about the pros and cons of using multiple databases. The question was:

We are designing a application where each system could handle at least 100 accounts. We currently need about 10 databases. We have three options we are considering.

Option 1) Put all databases in the same environment. This would mean at least 1000 databases in one environment. Does this cause a problem for BDB JE? And how far is this scalable?

Option 2) Give each account their own environment. This would limit the amount of databases per environment, but we would have 100+ environments on one system. Would 100+ environments consume to much of the Java resources?

Option 3) One environment, 10 databases, and prefix all entry keys with an account id. We prefer to avoid this option due to some of the functionality we gain by having separate databases or environments. With option 1/2 we can restore/move/rename one account easily. Don't have to worry about keeping data separate since each accounts have their own databases. However, if this provides the best performance we would look at implementing it.

A JE environment shares a variety of resources among different databases, such as cache and background threads. Option (2) is currently the least desirable because it would consume resources in an inflexible way. For example, although each account might need X amount of cache only while it was active, the 100+ environments would grow to consume 100 * X amounts of cache. In upcoming JE releases, JE will provide the option of sharing a cache between environments, but that functionality isn't publicly available yet.

Option 1 & 3 are both plausible, though some operations in Option 1 will cost more. Whether this is an issue depends on your application access patterns. We'll describe some JE internals to explain the trade offs between option 1 & 3.

Each JE database is implemented as a btree. In addition, the environment maintains an additional, internal btree, which we call a mapping tree, to serve as an index to the databases. So in Option1, you have a mapping tree with 1000 records, which serves to direct you to your 1000 databases. In Option2, you have a mapping tree with 1 record, which points you to 1 larger database. In theory, JE can support an unlimited number of databases because the mapping tree can grow to any size. In practicality, the following factors apply right now to database manipulation.

  • Currently, the mapping tree stays pinned into the JE cache, so having more open databases consumes additional memory. Each open database consumes along the lines of 2000 bytes. This has been optimized in future JE releases so that the mapping tree is no pinned, and there is no additional memory cost, but this is not yet available publicly.
  • Environment.openDatabase() does a number of administrative checks and lookups, so it's more expensive to open and close databases than to find a record in a database.
  • It costs more to checkpoint the root of a database than other portions. Whether this matters depends on how the application accesses the database. For example, It costs marginally less to insert 1000 records into 1 database than to insert 1 record into 1000 databases. However, it is much less to checkpoint the former rather than the latter.
Suppose we update 1 record in 1000 databases. In a small test program, the checkpoint takes around 730 ms.

Suppose we update 1000 records in 1 database. In the same test, the checkpoint takes around 15 ms.

Another issue is:

  • Database.getDatabaseName() does a linear search of the records in the mapping tree and is slow, but a workaround is to store the name in your own application, with the reference to the Database.
In the end, Option 1 is slower, but that may be fine for your application, when you consider the pattern of account access and the additional cost of adding and removing accounts if you handle it within the application.

Back to top


Is a larger cache always better for JE?

In general, JE performs best when its working set fits within cache. But due to the interaction of Java garbage collection and JE, there can be scenarios when JE actually performs better with a smaller cache.

JE caches items by keeping references to database objects. To keep within the memory budget mandated by the cache size, JE will release references to those objects and they will be garbage collected by the JVM. Many JVMs use an approach called generational garbage collection. Objects are categorized by age in order to apply different collection heuristics. Garbage collecting items from the younger space is cheaper and is done with a "partial GC" pass while longer-lived items require a more expensive "Full GC".

If the application tends to access data records that are rarely re-used, and the JE cache has excessive capacity, the JE cache will become populated with data records that are no longer needed by the application. These data records will eventually age and the JVM will re-categorize them as older objects, which then provokes more Full GC. If the JE cache is smaller, JE itself will tend to dereference, or evict these once-used records more frequently and the JVM will have younger objects to garbage collect.

Garbage collection is really only an issue when the application is CPU bound. To find this point of equilibrium, the user can monitor EnvironmentStats.nCacheMisses and the application's throughput. Reducing the cache to the smallest size where nCacheMisses is 0 will show the optimal performance. Enabling GC statistics in the JVM can help too. (In the Java SE 5 JVM this is enabled with ("-verbose:gc", "-XX+PrintGCDetails", "-XX:+PrintGCTimeStamps")

Back to top


What are JE read buffers and when should I change their size?

JE follows two patterns when reading items from disk. In one mode a single database object, which might be a btree node or a single data record, is faulted in because the application is executing a Database or Cursor operation and cannot find the item in cache. In a second mode, JE will read large sequential portions of the log on behalf of activities like environment startup or log cleaning, and will read in one or multiple objects.

Single object reads use temporary buffers of a size specified by je.log.faultReadSize while sequential reads use temporary buffers of a size specified by je.log.iteratorReadSize. The defaults for these properties are listed in <jeHome>/example.properties, and are currently 2K and 8K respectively.

The ideal read buffer size is as small as possible to reduce memory consumption but is also large enough to adequately fit in most database objects. Because JE must fit the whole database object into a buffer when doing a single object read, a too-small read buffer for single object reads can result in wasted, repeated read calls. When doing sequential reading, JE can piece together parts of a database object, but a too-small read buffer for sequential reads may result in excessive copying of data. The nRepeatFaultReads and nRepeatIteratorReads fields in EnvironmentStats show the number of wasted reads for single and sequential object reading.

If nRepeatFaultReads is greater than 0, the application may try increasing the value of je.log.faultReadSize. If nRepeatIteratorReads is greater than 0, the application may want to adjust je.log.iteratorReadSize and je.log.iteratorMaxSize.

Back to top


What are JE write buffers and when should I change their size?

JE log files are append only, and all record insertions, deletions, and modifications are added to the end of the current log file. See the FAQ on What is so different about JE log files? for more information.

New data is buffered in write log buffers before being flushed to disk. As each log buffer is filled, a write system call is issued. As each .jdb file reaches its maximum size, a fsync system call is issued and a new .jdb file is created.

Increasing the write log buffer size and the JE log file size can improve write performance by decreasing the number of write and fsync calls. However, write log buffer size has to be balanced against the total JE memory budget, which is represented by the je.maxMemory, or EnvironmentConfig.getCacheSize(). It may be more productive to use available memory to cache database objects rather than write log buffers. Likewise, increasing the JE log file size can make it harder for the log cleaner to effectively compress the log.

The number and size of the write log buffers is determined by je.log.bufferSize, je.log.numBuffers, and je.log.totalBufferBytes. By default, there are 3 write log buffers and they consume 7% of je.maxMemory. The nLogBuffers and bufferBytes fields in EnvironmentStats will show what the current settings are.

An application can experiment with the impact of changing the number and size of write log buffers. A non-transactional system may benefit by reducing the number of buffers to 2. Any write intensive application may benefit by increasing the log buffer sizes. That's done by setting je.log.totalBufferBytes to the desired value and setting je.log.bufferSize to the total buffer size/number of buffers. Note that JE will restrict write buffers to half of je.maxMemory, so it may be necessary to increase the cache size to grow the write buffers to the desired degree.

Back to top


Why is my application performance slower with transactions?

Many users see a large performance difference when they enable or disable transactions in their application, without doing any tuning or special configuration.

The performance difference is the result of the durability (the D in ACID) of transactions. When transactions are configured, the default configuration is full durability: at each transaction commit, the transaction data is flushed to disk. This guarantees that the data is recoverable in the event of an application crash or an OS crash; however, it comes with a large performance penalty because the data is written physically to disk.

If you need transactions (for atomicity, for example, the A in ACID) but you don't need full durability, you can relax the durability requirement. When using transactions there are three durability options:

  • commitSync—flush all the way to disk. This provides durability in the face of an OS or application crash.
  • commitWriteNoSync—write to the file system buffers but do not force a sync to disk. This provides durability in the face of an application crash, but data may be lost in the event of an OS crash.
  • commitNoSync—do not write or sync to disk. This provides no durability guarantees in the event of a crash.
You can call these specific Transaction methods, or you can call commit and change the default using an