Frequently Asked Questions About Berkeley DB

Frequently Asked Questions About Berkeley DB
Specific to the SQL API

What exactly is a Berkeley DB database? Is it the same as a relational (SQL) database table?

Yes, conceptually a Berkeley DB database is a single relational database table. Also you can consider key/value pairs each as a single row in a table where the columns are data encoded within either the key or the value by the application.

Does the move to the AGPL open source license affect customers who purchased a commercial license?

No. If you own a commercial license, you can continue to download and use new releases of Berkeley DB under the terms of your commercial license. Berkeley DB continues to be dual license, open source or commercial. The open source license has changed from the (proprietary) Sleepycat license to the (industry standard) AGPL.

Do I need to include a monitor process in my design for a multi-process Berkeley DB application?

Take a minute to read these two sections of the sections of the Berkeley DB Reference Guide, they cover this topic in great detail.

How can I associate application information with a database or database environment handle?

When using the C API, the DB and DB_ENV handles each contain an app_private field intended to be used to reference application-specific information. See the db_create and db_env_create documentation for more information.

In the C++ or Java APIs, the easiest way to associate application-specific data with a handle is to subclass the Db and DbEnv handles, for example subclassing Db to get MyDb. Objects of type MyDb will still have the Berkeley DB API methods available on them, and you can put any extra data or methods you want into the MyDb class. If you are using callback APIs that take Db or DbEnv arguments (for example, the Db.set_bt_compare() method), these will always be called with the Db or DbEnv objects you create. So if you always use MyDb objects, you will be able to take the first argument to the callback function and cast it to a MyDb (in C++, cast it to (MyDb*)). That will allow you to access your data members or methods.

What is the ECCN (Export Control Classification Number) for Berkeley DB?

Berkeley DB does optionally include strong cryptographic support. Export/import and/or use of cryptography software, or even communicating technical details about cryptography software, is illegal in some parts of the world. You are strongly advised to pay close attention to any export/import and/or use laws which apply to you when you import a release of Berkeley DB including cryptography to your country or re-distribute source code from it in any way. If this is a concern, we recommend downloading the NC (for non-crypto) versions of Berkeley DB product downloads because they do not include cryptography code.

Why do I get a compilation error on Windows when I create a project using MFC or anything that includes oledb.h?

Berkeley DB's header file db.h and Microsoft's header file oledb.h both define the symbol DBTYPE. Unfortunately, changing either use of this symbol would break existing code.

The first and simplest solution to this problem is to organize your source code so that only one of these two header files is needed in any of your sources. In other words, separate the uses of Berkeley DB and the uses of Microsoft's OLE DB library so that they are not mixed in your code.

Then, just choose either db.h or oledb.h, but do not mix them in one source file.

If that is not possible, and you have to mix both headers, wrap one of the #include lines as follows.

Find the line where oledb.h is included in your source code. This may be in the automatically-generated stdafx.h include file. Decide whether that header file is really needed. If it is, change the include line from this:

#include <oledb.h>

to this:


/* Work around DBTYPE name conflict with Berkeley DB */
#define DBTYPE MS_DBTYPE
#include <oledb.h>
#undef DBTYPE

Then if you need to use Microsoft's DBTYPE, refer to it as MS_DBTYPE. Alternatively, for C applications, you can wrap the include of db.h in a similar way. You can not wrap db_cxx.h using this technique. If you are using the C++ interface to Berkeley DB, you need to with avoid mixing oledb.h with db_cxx.h or wrap the include of oledb.h as described above.

What do I do when I run out of lockers / locks / lock objects?

The Berkeley DB environment keeps memory for a fixed number of lockers, locks and lock objects -- so it is always possible to run out of these resources. The maximum amount of lock resources to be allocated is set when the database environment is created, so to change this number, you will need to increase the increase the number of lockers, locks and/or lock objects and re-create your environment. See the Configuring locking: sizing the system section of the Berkeley DB Reference Guide for more information.

Berkeley DB returned error number 12 or 22. What does that mean?

The application is calling the Berkeley DB API incorrectly or configuring the database environment with insufficient resources. The Berkeley DB library outputs a verbose error message whenever it is about to return a general-purpose error, or throw a non-specific exception. Whenever it is not clear why an application call into Berkeley DB is failing, the first step is always to review the verbose error messages, which will almost always explain the problem. See the Run-time error information section of the Berkeley DB Reference Guide for more information. It's also useful to know how Berkeley DB divides up the error name space: Except for the historic dbm, ndbm, and hsearch interfaces, Berkeley DB does not use the global variable errno to return error values. The return values for all Berkeley DB functions are grouped into the following three categories:

A return value of 0 indicates the operation was successful.
A return value that is greater than 0 indicates that there was a system error. The errno value returned by the system is returned by the function; for example, when a Berkeley DB function is unable to allocate memory, the return value from the function will be ENOMEM.
A return value that is less than 0 indicates a condition that was not a system failure, but was not an unqualified success, either. For example, a routine to retrieve a key/data pair from the database may return DB_NOTFOUND when the key/data pair does not appear in the database; as opposed to the value of 0, which would be returned if the key/data pair were found in the database.

All values returned by Berkeley DB functions are less than 0 in order to avoid conflict with possible values of errno. Specifically, Berkeley DB reserves all values from -30,800 to -30,999 to itself as possible error values. There are a few Berkeley DB interfaces where it is possible for an application function to be called by a Berkeley DB function and subsequently fail with an application-specific return. Such failure returns will be passed back to the function that originally called a Berkeley DB interface. To avoid ambiguity about the cause of the error, error values separate from the Berkeley DB error name space should be used. Finally, you can always get the message string that's associated with the error number that Berkeley DB returns from the db_strerror function. The db_strerror function is a superset of the ANSI C X3.159-1989 (ANSI C) strerror(3) function. If the error number error is greater than or equal to 0, then the string returned by the system function strerror(3) is returned. If the error number is less than 0, an error string appropriate to the corresponding Berkeley DB library error is returned.</p>

Can running out of disk space corrupt a database?

Berkeley DB can continue to run when when out-of-disk-space errors occur, but it requires the application to be transaction protected. Applications which do not enclose update operations in transactions cannot recover from out-of-disk-space errors, and the result of running out of disk space may be database corruption.

Is it possible to corrupt a database when storing multiple databases in a single physical file?

Usually, what appears to be a corrupted database, is usually the result of two database handles not sharing an underlying database environment. See the Opening multiple databases in a single file section of the Berkeley DB Reference Guide for more information.

Why is there a lot of disk activity when I exit my application?

When a Berkeley DB application calls the database handle close method to discard a database handle, the dirty pages in the cache will written to the backing database file by default. To change this behavior, specify the DB_NOSYNC flag to the Db.close (or the noSync flag to the Database.close method when using the Java API); setting this flag will cause the handle close method to ignore dirty pages in the cache. Many applications do not need to flush the dirty pages from the cache when the database handle close method is called. Applications using transactions or replication for durability don't need to flush dirty pages as the transactional mechanisms ensure that no data is ever lost. Further, there is never a requirement to flush the dirty pages from the cache until the database environment is about to be removed: processes can join and leave a database environment without flushing the dirty pages held in the cache, and only when the database environment will never be accessed again should dirty pages be flushed to the backing file.

What causes an unable to initialize mutex: Function not implemented error?

The most common reason for this error in a Berkeley DB application is that a system call underlying a mutex configured by Berkeley DB is not available on the system, thus, the return of ENOSYS, (which is the system error associated with the Function not implemented message). Generally, this happens because the Berkeley DB library build was specifically configured to use POSIX mutexes, and POSIX mutexes aren't available on this system, or the library was configured on a different system where POSIX mutexes were available, and then the library was physically moved to a system where POSIX mutexes were not available. This error happens occasionally on Linux systems, because some Linux systems have POSIX mutex support in the C library configuration, but not in the operating system, or the POSIX mutex support they have is only for intra-process mutexes, not inter-process mutexes. To avoid this error, explicitly specify the mutex implementation DB should use, with the --with-mutex=MUTEX configuration flag:


    --with-mutex=MUTEX
        To force Berkeley DB to use a specific mutex implementation,
        configure with --with-mutex=MUTEX, where MUTEX is the mutex
        implementation you want. For example,
    --with-mutex=x86/gcc-assembly will configure Berkeley DB to use
        the x86 GNU gcc compiler based test-and-set assembly mutexes.
        This is rarely necessary and should be done only when the
        default configuration selects the wrong mutex implementation. A
        list of available mutex implementations can be found in the
        distribution file dist/aclocal/mutex.ac

Why would Berkeley DB operations return errors like EINVAL when it appears that I'm using the API correctly?

This generally happens when using uninitialized memory. It is the responsiblity of the application to zero out the DBT objects before invoking the Berkeley DB API. Before using a DBT, you must initialize all its elements to 0 and then set the ones you are using explicitly. Another common explanation of this symptom is the application may be using Berkeley DB handles in a free-threaded manner, without specifying the DB_THREAD flag to the DB->open or DB_ENV->open methods. Any time you are sharing a handle across multiple threads, you must specify DB_THREAD when you open that handle. Another reason for this symptom is the application is concurrently accessing the database, but not acquiring locks. The Berkeley DB Data Store product does no locking at all; the application must do its own serialization of access to the database to avoid corruption. The Berkeley DB Concurrent Data Store and Berkeley DB Transactional Data Store products do lock the database, but still require that locking be configured.

Are Berkeley DB databases portable between architectures with different integer sizes and different byte orders?

Yes. Databases can be moved between 32- and 64-bit machines, as well as between little- and big-endian machines. See the Selecting a byte order section of the Berkeley DB Reference Guide for more information.

Berkeley DB on IBM

We found that running with memory mapped files on Z/OS is problematic. The visibility of writes in the memory region requires a sync to allow other processes to view them. This greatly slows down performance. We recommend against using memory mapped files when using Z/OS.

Will checkpointing the database environment disrupt, slow or block access to the databases?

A checkpoint doesn't block access to the Berkeley DB database environment, and threads of control can continue to read and write databases during checkpoint. However, the checkpoint potentially triggers a large amount of I/O which could slow other threads of control, and make it appear that access has been blocked. You can use the DB_ENV->memp_trickle method to spread out the I/O that checkpoint will need to perform (the DB_ENV->memp_trickle method ensures a specified percent of the pages in the cache are kept clean). Alternatively, you can limit the number of sequential write operations scheduled by the DB library, using the DB_ENV->memp_set_max_write method. The DB_ENV->memp_set_max_write method affects all of the methods that flush the database cache (checkpoint, as well as other methods, for example, DB->sync).

What does it mean when the LSN values are past the end of the log?

LSN stands for log sequence number. Sometimes when this happens you'll also see the message, transactional database environment cannot be recovered. Log files contain information about transactions, when you see this message it is likely that the application or some other process or systems anomoly may have removed some or all of the log files without properly resetting the database log sequence numbers (LSNs). Log files should never be removed unless explicitly authorized by the db_archive utility or the DB_ENV->log_archive method. Note that those interfaces will never authorize removal of all existing log files. It may also be that an application has created a database file in one transactional environment and then moved it into another transactional environment. While it is possible to create databases in non-transactional environments (for example, when doing bulk database loads) and then move them into transactional environments, once a database has been used in a transactional environment, it cannot be moved to another environment without first resetting the database log sequence numbers. See the DB_ENV->lsn_reset method documentation for more information.

Why is my transactional application experiencing what appears to be an inordinately high number of deadlocks?

There are a few potential common explanations for this kind of behavior:

The application may be acquiring database objects in inconsistent orders; having threads of control always acquire objects in the same order will reduce the frequency of deadlocks.
If you frequently read a piece of data, modify it and then write it, you may be inadvertently causing a large number of deadlocks. Try specifying the DB_RMW flag on your {{DB->get()}}} calls.
Finally, reducing the isolation level can significantly reduce the number of deadlocks seen by the application. See the Isolation and Degrees of isolation sections of the Berkeley DB Reference Guide for more information.

What does the error: Unable to allocate memory for transaction detail mean?

This error means the maximum number of active transactions configured for Berkeley DB has been reached. The Berkeley DB environment should be configured to support more active transactions. When all of the memory available in the database environment for transactions is in use, calls to being a transaction will fail until some active transactions complete. By default, the database environment is configured to support at least 20 active transactions. For more information see the Configuring transactions section of the Berkeley DB Reference Guide.

How do I use the DB_AUTO_COMMIT flag with a database cursor?

The DB_AUTO_COMMIT flag does not apply to cursors. If you want transactions for cursor operations you must create and use an explicit transaction.

Can Berkeley DB use NFS, SAN, or other remote/shared/network filesystems for databases and their environments?

Berkeley DB works great with a SAN (and with any other filesystem type as far as we know), but if you attempt to access any filesystem from multiple machines, you are treating the filesystem as a shared, remote filesystem and this can cause problems for Berkeley DB. See the Remote filesystems section of the Berkeley DB Reference Guide for more information. There are two problems with shared/remote filesystems, mutexes and cache consistency. First, mutexes: For remote filesystems that do allow remote files to be mapped into process memory, database environment directories accessed via remote filesystems cannot be used simultaneously from multiple clients (that is, from multiple computers). No commercial remote filesystem of which we're aware supports coherent, distributed shared memory for remote-mounted files. As a result, different machines will see different versions of these shared region files, and the behavior is undefined. For example, if machine A opens a database environment on a remote filesystem, and machine B does the same, then machine A acquires a mutex backed in that remote filesystem, the mutex won't correctly serialize machine B. That means both machines are potentially modifying a single data structure at the same time, and any bad database thing you can imagine can happen as a result. Second caches:

Databases, log files, and temporary files may be placed on remote filesystems, as long as the remote filesystem fully supports standard POSIX filesystem semantics (although the application may incur a performance penalty for doing so). Further, read-only databases on remote filesystems can be accessed from multiple systems simultaneously. However, it is difficult (or impossible) for modifiable databases on remote filesystems to be accessed from multiple systems simultaneously. The reason is the Berkeley DB library caches modified database pages, and when those modified pages are written to the backing file is not entirely under application control. If two systems were to write database pages to the remote filesystem at the same time, database corruption could result. If a system were to write a database page back to the remote filesystem at the same time as another system read a page, a core dump in the reader could result. For example, if machine A reads page 5 of a database (and page 5 references page 6), then machine B writes page 6 of the database, and then machine A reads page 6 of the database, machine A has an inconsistent page 5 and page 6, which can lead to incorrect or inconsistent data being returned to the application, or even core dumps. The core dumps and inconsistent data are limited to the readers in this scenario, and some applications might choose to live with that. You can, of course, serialize access to the databases outside of Berkeley DB, but that would imply a pretty significant hit to the overall performance of the system. So Berkeley DB is not designed to fully support multi-system concurrent access to a database environment on a shared disk available either on a network filesystem or a SAN.

Can Berkeley DB store data to raw disk partitions?

Berkeley DB wasn't designed to use raw disk partitions, for a few different reasons:

First, using a raw disk partition requires specialized archival, tuning and other database administration tools, because you can't trivially write tools to access the physical database and other files. Berkeley DB's design allows use of existing tools for database administration (for example, using the POSIX cp, tar, or pax utilities for hot backups), resulting in better integration into the local environment. This is also an advantage for the 3rd party software vendors that license DB, as they don't want to require non-standard archival procedures or tools or having to create and provide the same to their customers. Another reason is it's difficult or impossible to extend raw partitions, and so it becomes significantly harder to change the size of a database installation. When using the file system as DB does, you can mount another partition or disk, and you're done. (I should mention that DB database applications are able to continue running when there is no disk space available, unlike many database products -- because other applications can run DB applications out of disk space, it was necessary to make DB resilient to a lack of disk space.)
Second, raw partitions don't return much of a performance improvement anyway, at least on modern operating systems. Transactional performance in a write-ahead logging database system is usually bounded by writing log files, which are written sequentially, and writing the file sequentially minimizes any file system overhead. In terms of operation latency, Berkeley DB will only go to the file system if a read or write misses in the cache. Certainly, data sets exist where the working set doesn't fit into available cache, but there aren't many of them. In short, porting DB to raw partitions would not improve performance for applications where the working set fits into cache. You can put 100's of GB of cache on a system now, and that can handle a pretty large working set.
Third, there would be a lot of additional code needed to name files on the raw partition, allocate blocks to files, extend files, and so on. You have to write a layer that looks a lot like a file system, significantly increasing Berkeley DB's footprint, and that code will require continuous porting and performance tuning. If DB implemented its own file system, we would have to invest time in adding new drivers and working on disk performance optimizations, and that's not where we have deep expertise, it's not where we can add value. In the case of the existing architecture, customers don't have to worry if DB can run on a new piece of hardware, they just know it will. Further, DB automatically performs better as the underlying file system is tuned, or new file systems are rolled out for new types of disks (for example, solid-state disks, NVRAM disks, or new RAID devices).

All that said, the one strong argument for porting to a raw partition is to avoid double buffering (where a copy of a Berkeley DB database page is held in both the DB cache and the operating system's buffer cache). Fortunately, modern operating systems allow you to configure I/O to copy directly to/from the DB cache, avoiding the OS buffer cache and double buffering. It would not be a lot of work to change Berkeley DB to create databases on a raw partition: simply replace the underlying open, read, write and lseek interface calls to work on a raw partition. However, making Berkeley DB fully functional in that environment would require a lot of work, in order to make the general administration of the database environment work as smoothly as it does now. That said, third-party researchers experimenting with Berkeley DB have done this. Their work is not available in any form, but serves as a proof point.

Why is my page-fill factor very low despite using integers for keys in a btree database?

This is usually the result of using integer keys on little-endian architectures such as the x86. Berkeley DB's default key comparison function, the code that is used to order the keys in your btree, treats the data in the DBT which you know to be an integer as a byte strings. As it turns out, little-endian integers don't sort well when compared as though they were strings. For example, take the numbers 254 through 257. Their byte patterns on a little-endian system are:


254     fe 0 0 0
255     ff 0 0 0
256      0 1 0 0
257      1 1 0 0

If you treat them as strings, then they sort badly:

On a big-endian system, their byte patterns are:


254     0 0 0 fe
255     0 0 0 ff
256     0 0 1 1
257     0 0 1 1

and so, even when compared as strings the keys will sort nicely. Which means, if you use steadily increasing integers as keys on a big-endian system Berkeley DB behaves well and you get compact trees, but on a little-endian system Berkeley DB produces much less compact trees. To avoid this problem, you may want to convert the keys to flat text or big-endian representations, or provide your own Btree comparison function using the DB->set_bt_compare method.

How do I guarantee that Replication Manager connections to or from a dual stack machine use IPv4?

Replication Manager uses IPv6 addresses whenever possible. A dual stack machine is a machine that supports use of both IPv6 and IPv4. It is common practice to define a host name that maps to both IPv6 and IPv4 addresses on a dual stack machine. When such a host name is used to specify a Replication Manager site, Replication Manager always tries IPv6 addresses first.

In order to use IPv4 connections, you must specify Replication Manager sites on dual stack machines in one of the following ways: use a host name that only maps to IPv4 addresses or use an IPv4 numeric address in dot-decimal notation.

Which upgrades require extra steps to start using Replication Manager Group Membership?

Replication Manager introduced Group Membership in the Berkeley DB 11.2.5.2 release. This feature improves Replication Manager's ability to manage various replication activities based on the number of sites in the replication group.

If you are a Replication Manager user upgrading from a version of Berkeley DB prior to 11.2.5.2 to Berkeley DB 11.2.5.2 or later, you need to follow some additional upgrade steps to start using Group Membership. These steps are described in Upgrading groups.

Specific to the SQL API

I thought Berkeley DB was a key/value database, what is the SQL API?

Berkeley DB's SQL API is the SQL processing top half of SQLite layered on top of the storage engine of Berkeley DB. This combination of Berkeley DB and SQLite provides a SQL command line utility called dbsql (a functional equivalent of the sqlite3 command line utility), third party libraries (including JDBC and ODBC), and the SQLite version 3 C/C++ API. This feature is enabled by passing the --enable-sql argument to the configure script before you build the library and tools from the Berkeley DB package.

What are the Benefits of Using the Berkeley DB SQL Interface?

Many, including:

Provides the robustness and reliability of the Berkeley DB storage engine.
Enhances SQLite with concurrency by using fine grained locking, so that multi-threaded and multi-process applications can achieve significant throughput improvements.
Retains the SQLite version 3 interface, so it can be used as a drop in replacement for SQLite applications.
The resulting product retains many of the attractive features of SQLite:
- Small footprint, approximately 1MB
- Low overhead - suitable for embedded applications
- Ease of management for database files
- Compatibility with current set of tools available for SQLite

What are the key benefits of Berkeley DB storage engine?

Many, including:

Full ACID compliance, proven in millions of deployments
Local, in-process data storage
Fine-grained and configurable locking for highly concurrent systems
Flexible and configurable deadlock detection
Hot and cold backups, log file archival, and full database dumps
Scalable to petabytes of data, billions of records

What are some typical application scenarios?

Low latency and high concurrency RDBMS
Embedded/handheld devices and application software
Websites engine for mass video/music/data storage
Alternative of enterprise RDBMS
Application file format
Command-line dataset analysis tool
Internal or temporary in-memory database
Secure on-line update/sync system
Replacement for ad hoc disk files
Enhancement for SQLite applications

How do I configure Berkeley DB to build the SQLite compatible API and command line tool?

You have a number of options to consider when setting up Berkeley DB's source code for a build. This is done using the configure script in the dist directory. If you were on a UNIX-like system you can see the options by doing:


$ cd db
$ cd build_unix
$ ../dist/configure --help

In that output you'll see:


  --enable-sql            Build the SQL API.
  --enable-sql_compat     Build a drop-in replacement sqlite3 library.
  --enable-jdbc           Build BDB SQL JDBC library.
  --enable-amalgamation   Build a SQL amalgamation instead of building files
                          separately.

So to build a drop in replacement do:

$ ../dist/configure --enable-sql --enable-sql_compat

Where can I find out more about SQLite?

First, recall that the Berkeley DB SQL API is essentially identical to SQLite. The products [do differ in some significan ways http://www.oracle.com/technetwork/database/berkeleydb/bdb-sqlite-comparison-wp-176431.pdf], but it's best to first understand and use [SQLite http://sqlite.org] and then find out what benefits there are to using BDB's SQL API. Alternatively there are many tutorials to help you get started with the Berkeley DB SQL Interface. Remember to build with --enable-sql passed to the configure script to generate the SQL library and command line tool dbsql. If you don't build the drop-in replacement remember that where a tutorial suggests running the sqlite3 command line tool, you'll run the Berkeley DB dbsql tool instead.

Which operating systems and platforms are supported by the Berkeley DB SQL Interface?

Most platforms will be supported. Our developers have tested on:

Linux; Oracle Enterprise Linux, RedHat, CentOS, Ubuntu, etc.
Windows 7, Windows XP, Windows Vista
Solaris
Mac OS/X
Apple iOS, Android 2.x
Cygwin

Systems with support for POSIX should work. Berkeley DB is engineered to use as few operating system calls as possible, this makes it is highly portable.

How do I build on a Unix-like system?

Run the Berkeley DB configure utility as normal, adding the --enable-sql flag. For example:


download db-5.0.XX.tar.gz
$ tar xzvf db-5.0.XX.tar.gz
$ cd db-5.0.XX/build_unix
$ ../dist/configure --enable-sql
$ make

Once complete, you should then have a library called libdb_sql-5.0.la and a command line tool called dbsql that can create and manipulate SQL databases in the following manner:


$ dbsql test.db
Berkeley DB SQL interface version 11.2.5.1.XX
Enter .help for instructions
Enter SQL statements terminated with a ;
dbsql> create table t(a integer, b string);
dbsql> insert into t values (1, 'one');
dbsql> insert into t values (2, 'two');
dbsql> select * from t;
1|one
2|two

Can I build a drop-in replacement for SQLite?

Yes, by specifying the --enable-sql_compat flag to Berkeley DB's configure script. That will create an additional library called libsqlite3.la and an additional binary called sqlite3, which have an identical interface to SQLite.

How do I build with Visual Studio?

Open the Berkeley DB solution in Visual Studio and build the db_sql_shell project. That will produce a binary called dbsql.exe in the directory Win32/{Debug|Release}/.

Is there a list of the build-time options?

In addition to the in addition to the SQLite-specific compile-time flags Berkeley DB's SQL API can be further tailored with the following:

BDBSQL_LOG_REGIONMAX
BDBSQL_OMIT_LEAKCHECK
BDBSQL_OMIT_LOG_REMOVE
BDBSQL_SINGLE_PROCESS
BDBSQL_SINGLE_THREAD
BDBSQL_USER_AUTHENTICATION
BDBSQL_USER_AUTHENTICATION_KEYSTORE
SQLITE_DEFAULT_CACHE_SIZE
SQLITE_DEFAULT_JOURNAL_SIZE_LIMIT
SQLITE_DEFAULT_PAGE_SIZE

Which version of SQL does the product implement?

SQL92, apart from these exceptions.

SQLite data types are documented here, but what are the important implications and how does it affect the BDB integration?

There are no differences between the Berkeley DB's SQL API and the SQLite API related to SQL data types. The code to handle data types in Berkeley DB is identical to the code in SQLite.

The documentation says that SQLite uses a dynamic type system as opposed to a static type system, but that it's backwards compatible. What might be the issues of this from an RDBMS user's standpoint?

First, take a minute to read the SQLite documentation on data types. SQL used to access statically typed databases will be compatible with Berkeley DB.

Does Berkeley DB support Dynamic SQL?

Yes. The application can generate and execute SQL statements at runtime if desired.

Does Berkeley DB support prepared statements?

Yes, SQLite has support for prepared statements via the C/C++ API.

Does Berkeley DB support PL/SQL?

The SQLite alternative to PL/SQL is the native C/C++ interface. As is common with SQLite, there is 3rd Party support for PL/SQL which may be a viable alternative for your application.

Does Berkeley DB support conditional clauses in SQL statements?

Yes, conditional statements are fully supported.

Does Berkeley DB support direct access to cursors via SQL?

No, though cursor semantics are available via the C/C++ APIs.

What is the story about thread safety -- can a sqlite3 db handle be shared between threads?

Berkeley DB library is thread safe, the SQL API uses TDS (transactional data store), you are free to use any variety of threads (BDB is thread agnostic) or multiple processes accessing the database concurrently. Berkeley DB will manage locking, serialization, and other issues for you. You'll want to learn about thread safety in SQLite too.

Is autocommit supported?

Yes, Berkeley DB automatically runs each command in its own transaction, and if the command does not fail, its changes are automatically committed. This mode of operation (implicit transactions) is referred to as autocommit mode. Using explicit transactions may improve system efficiency.

What tools are available for use with Berkeley DB SQL?

Most tools available for SQLite will work with the Berkeley DB application.

For information on other tools that can be used with the SQL interface, see the administration chapter of the Berkeley DB SQL API documentation.

Is there a command line tool for the Berkeley DB interface?

Yes, there is a command line tool called dbsql, which provides a SQL Interface from the command line. If Berkeley DB is configured with the --enable-sql_compat flag, another copy of the command line tool will be built, with the same functionality, called sqlite3.

Does Berkeley DB have an import/export tool?

Yes, the built-in .dump command will produce SQL that can be executed to recreate a database.

Is there a table creation tool?

There are many SQLite database management tools available. Of the list of tools known to work with SQLite one probably satisfies your particular requirements. It is also possible to use the SQLite command line SQL Interpreter to create tables, the command line can be used from a script.

Is there a consistency checking tool?

Yes, there is a consistency checking tool built in. The check is accessed using a PRAGMA integrity_check command. See the pragma page for more information.

What character sets are supported?

This http://www.sqlite.org/datatype3.html">page explains: TEXT. The value is a text string, stored using the database encoding (UTF-8, UTF-16BE or UTF-16-LE).

Is the Shift-JIS (SJIS, emoji) character set supported?

It is possible to store S-JIS directly, but it requires implementing some APIs so that SQLite can interpret the character set. UTF-8 encoding is preferred. Storing Shift-JIS directly, without converting to UTF-8, appears possible.

What data types are supported?

Please read the SQLite data type documentation.

What kind of searches are possible in SQL statements for each Data Type?

See data types and language expression documentation on the SQLite site. Full search, partial match (LIKE, GLOB), and custom regular expressions are supported on all column types.

Default sorting is done based on char code. Does DB has a way to sort not by char code?

Yes. See documentation on collation sequences.

What are the constraints for each Data Type?

Because constraints are implemented in the SQLite code they are identical to SQLite's constraints. Primary keys normally imply a UNIQUE constraint, except for integer primary keys. Unique key constraints are supported. Foreign key constraints are currently not supported (the statements will be parsed, but not enforced).

Are NULL values supported in TEXT columns?

NULL values are supported, they sort before all other items in the table (including other NULL values). So there is no defined order amongst NULL VARCHAR fields.

When a 256KB CSV is stored into an Berkeley DB table, how much space will the database use?

This depends on the page size, the key size (are some values in each comma separated row the key?), page fill factor, and some minimal amount of meta-data overhead. If compression is used it could be smaller than 256KiB depending on the data and the compression algorithm. Berkeley DB will use more than 256KiB because there is some overhead in indexing the keys + some bytes of per page overhead. The index overhead is dependent on the size of the key, so difficult to estimate. For example, if the keys are small, then the index overhead will be small. Every DB adds some overhead for indexing, page header etc.

Are nested transactions supported?

Transactions created using BEGIN...COMMIT do not nest. For nested transactions, use the SAVEPOINT and RELEASE commands. This is identical to the behavior of SQLite.

Is Access Control supported in Berkeley DB?

Access control (authentication, user priviliges) is implemented at an application level.

Are multiple concurrent users supported?

Yes, it is possible to have multiple users connect to a database concurrently. There is no authentication in SQLite (see Q/A above), in this answer we're using the term user to mean thread or process.

Does Berkeley DB support stored procedures?

Stored procedures are not supported by SQLite, and so they are not supported in Berkeley DB's SQL API either.

Does Berkeley DB support triggers?

Yes, triggers are supported, but there are some unsupported uses.

Does Berkeley DB support transactions?

Yes, transactions are supported, with the SQLite semantics, as described here.

Does Berkeley DB support backup?

Yes, backup is supported, via the standard Berkeley DB backup procedure.

Are Berkeley DB operations "consistent" (the "C" in ACID)?

Yes, consistency is guaranteed, unless a database is configured to disable the constraint.

How could I create a in-memory SQL database?

If the filename is :memory:, then a private, temporary in-memory database is created for the connection.

Is it possible to give an individual query a priority?

Currently, there is no support for pre-empting a low-priority operation.

Is it possible to return error codes when the data itself is corrupted?

Yes, all such issues are reported as error codes returned from API calls. The SQLite PRAGMA integrity_check command can be used to explicitly validate a database.

Does Berkeley DB support raw disk access?

Raw disk access is not currently supported, nor are there plans to support this functionality in the future.

Does Berkeley DB support schema migration?

SQLite has support for ALTER TABLE, but it is limited. This is something that future versions of SQLite may change this.

What are the differences between using SQLite and Berkeley DB?

The programming interface (API) and SQL language supported are identical: Berkeley DB uses that part of SQLite without any changes. Similarly for the special commands understood by the command line tool. However, the different storage architecture (and time available for the release) has meant that there are some caveats about 100% compatibility:

some SQLite pragmas are unsupported or have different semantics and default values (e.g. cache size)
differences in locking semantics generally result in higher concurrency, but may also lead to deadlocks occurring in Berkeley DB that did not occur in SQLite (for example, if two transactions are executed concurrently, one updating table A then table B, and the other doing the updates in the opposite order).

Can Berkeley DB open SQLite database files or automatically migrate them?

No, Berkeley DB does not include a complete copy of SQLite's code, we've taken out the btree files. In particular, all of the code relating to manipulating files has been replaced by the Berkeley DB storage engine. You will need to migrate the schema and data using the sqlite3 command line interface to .dump and then the Berkeley DB dbsql command line to .load the data and schema from the dump file.

How can I migrate data from SQLite to Berkeley DB?

Use the sqlite3 command tool that is part of SQLite and run this:

$ sqlite3 my.db '.dump' | dbsql new.db

What does `BEGIN IMMEDIATE/EXCLUSIVE` do in Berkeley DB?

These special transaction modes are used in SQLite to avoid deadlock. Internally, SQLite has the notion of read-only vs update transactions. These keywords apply to update transactions, and fully understanding them requires understanding SQLite locking. In Berkeley DB, these keywords are mostly ignored. That is, they are not required to avoid deadlock (nor do they prevent deadlocks, but such deadlocks are automatically detected by Berkeley DB). The effect of using either keyword is that the transaction starts as an update transaction, which causes exclusive locking for reads (similar to SELECT FOR UPDATE in other RDBMSs). This can reduce the frequency of deadlocks in some applications.

Beginning in 5.2, BEGIN EXCLUSIVE will create a transaction that will block new transactions from starting, and block existing transaction from making forward progress, until the exclusive transaction is committed or aborted.

Other SQLite restrictions/limitations/issues that are removed with BDB?

In Berkeley DB, PRAGMA synchronous has a different meaning to SQLite. In SQLite, with levels lower than FULL, databases may be corrupted after a failure. In Berkeley DB, databases can always be recovered to a consistent state, because write-ahead logging semantics are always enforced on the Berkeley DB log. So the synchronous setting can improve throughput in Berkeley DB without risking corruption. The synchronous=ON level in Berkeley DB is equivalent to the DB_TXN_WRITENOSYNC flag and implies that transactions committed before a system crash (such as a hard reboot or power loss) may be rolled back on recovery. Application crashes do not cause committed transactions to be rolled back at this level. The synchronous=OFF level is equivalent to the DB_TXN_NOSYNC flag and implies that committed transactions may be rolled back after either an application or a system crash.

Does Berkeley DB support ADO.NET?

Yes - there is an ADO.NET package for Berkeley DB. It is available from the Berkeley DB Download page, and includes documentation. If you need help ask a question on our forums.

Should I be concerned about the security warning when opening the ADO.NET solution file in Visual Studio 2008?

No - you don't need to be concerned. The warning appears because we auto-generate the build files outside of Visual Studio. There is nothing harmful in our files.

The exact warning message is: "Security warning for System.Data.SQLite.2008"

I see an ADO.NET package, but do you have a System.Data.SQLite package?

These are the same thing. The ADO.NET package provides an implementation of System.Data.SQLite for .NET.

How do I use fts1 and fts2 in Berkeley DB?

Please contact us if you need to use fts1 and fts2.

What are the minimum and default memory requirements?

Cache size 10MiB by default, minimum 100KiB. Log buffer size is 32KiB. Typically an additional 100KiB is used for other purposes. The minimum total size is approximately 1MiB.

What is the default page size?

Berkeley DB configures a default page size based on the underlying file systems block size. See the documentation about page size tuning in Berkeley DB for more information.

Are there any constraints about number of tables?

The number of tables open is generally only limited by the number of file handles that can be opened.

Are there any constraints on table size?

The table size is generally limited by the maximum file size possible on the file system.

Are there any constraints on the number of records that can be stored in a table?

There is no practical limit. It is possible to store the number of records that can be indexed in a signed 64-bit value.

Are there any constraints on record size?

There is no practical constraint. The maximum length of a string or blob field is 1 billion bytes.

How smart is the SQLite optimizer? I saw online that it doesn't gather stats. Does that mean that it's sensitive to how the WHERE clauses are ordered? In other words, does the optimizer assume that the WHERE clause dictates the join order?

The SQLite optimizer is not particularly smart: it has some heuristics that boil down to using an index whenever one is available. There is some modeling of costs of queries, and there optional support for an analyze command that maintains statistics, but they are not very sophisticated. You can always find more detail on the internal workings of SQLite on its website as well as in this SQLite optimization FAQ, though some of it is out of date.

I'd be interested in understanding the indexing mechanisms a little better and how they relate to BDB vs SQLite. For example, give the low fill factor, should we expect that a BDB version of a SQLite database will be larger and by how much?

There are several sources of file size overhead in the Berkeley DB storage engine:

Berkeley DB uses the SQLite encoding of values directly, with the exception of a custom encoding for row IDs. That encoding includes length information, so it effectively duplicates the length that Berkeley DB maintains for each key/data pair.
Berkeley DB doesn't have any built-in support for databases where the data values will all be zero length: it stores ~8 bytes per index record to indicate a zero-length data item, where SQLite has a table type specially for this purpose.
the order of insertion into indices can lead to poor fill factors in Berkeley DB (that is, up to 50% free space in leaf pages of indices).

It is impossible to say exactly what impact these factors will have on database file size: the worst case would be with many indices and small records. With larger records and fewer indices, the file size may be in the same ballpark as SQLite.

I understand the Btree primary/secondary indexing scheme for BDB, but my guess is that this is different for SQL data. I'd like to understand how it works for SQL data in BDB and the implications of that.

When you create a table in SQL, a primary subdatabase is created in the Berkeley DB file. This primary database has integer keys that are usually allocated by SQLite and can be referred to from SQL by the implicit column name rowid. Each index that is declared in SQL creates an additional subdatabase, containing keys that include an encoding of the fields from the SQL columns making up the index, along with the row ID of the matching row in the primary. The Berkeley DB data part of these records is always empty: SQLite only ever sets or uses the keys. One special case is worth mentioning: applications that want an integer primary key for a table should declare the table as:

create table t(id integer primary key, ...)

This is handled by the SQLite parser, making the id an alias for the row ID. In other words, this avoids the maintenance of a secondary index, and the associated lookup cost at runtime. No other syntax, including creating an integer primary key index as a separate statement, will be handled in this special way. Note that Berkeley DB's built-in support for secondary indices and foreign keys is not used by the SQL Interface: indices are maintained by the SQLite query processor.

SQLite gets a shared read or exclusive write lock on a given file at a time. Does file mean a whole database or a single table?

A file refers to an entire SQL database (tables, indexes, everything), so SQLite usually does database-wide locking. The locking situation is more complex when SQLite is used in shared cache mode, where some operations lock a table rather than the whole file, but only relative to threads in the same process sharing the cache.

What other locking implications are there? For example, a large table scan requires BDB to acquire and release read locks on every page, but SQLite native only gets/releases a single lock. How much of an impact does this have on performance and what types of queries are most/least affected by it?

Locking has a significant impact on some queries: we have seen up to 30% of the runtime for some queries connected with locking (and latching). While scans do lock each leaf page in a table, they generally map to a bulk get, and the locking cost is often not significant. The most costly queries are ones that involve many random lookups, since each individual lookup acquires page locks in Berkeley DB. In particular, queries that use an index, but have moderate to high selectivity will typically perform worst with Berkeley DB: the relevant rows are identified quickly with an index scan, but looking up each row in the primary has a high overhead. Workarounds are standard RDBMS optimization techniques: avoid using an index (scan the primary), or to create a covering index (one which contains all columns of interest), so the primary does not need to be accessed.

Is PRAGMA synchronous=OFF equivalent to DB_TXN_WRITE_NOSYNC or DB_TXN_NOSYNC?

synchronous=OFF - is equivalent to setting DB_TXN_NOSYNC
synchronous=ON - is equivalent to setting DB_TXN_WRITE_NOSYNC
synchronous=FULL - is equivalent to Berkeley DB default behavior, where the log is flushed to stable storage for every commit.

FULL is the default setting, if no pragma is used.

How different is the Berkeley DB VACUUM from SQLite's?

SQLite implements the VACUUM command as a database dump followed by a complete reload from that dump. It is an expensive operation, locking the entire database for the duration of the operation. It is also an all or nothing operation. Either it works, or it fails and you have to try again sometime. When SQLite finishes, the database is frequently smaller in size (file size is smaller) and the btree is better organized (shallower) than before due to in-order key insertion of the data from the dump file. SQLite, when it works and when you can afford locking everyone out of the database, does a good job of VACUUM. Berkeley DB approaches this in a completely different way. For many releases now Berkeley DB's B-Tree implementation has had the ability to compact while other oprations are in-flight. Compacting is a process wherein the B-Tree nodes are examined and, when less than optimal, they are re-organized (reverse split, etc.). The more shallow your B-Tree, the fewer lookups required to find the data at a leaf node. Berkeley DB can compact sections of the tree, or the whole tree at once. For 7x24x365 (five-nines) operation this is critical. The BDB version of compact won't adversly impact ongoing database operations whereas SQLite's approach does. But compaction doesn't address empty sections of the database (segments of the database file where deleted data once lived). Berkeley DB also supports compression of database files by moving data within the file, then truncating the file returning that space to the filesystem. As of release 5.1 of Berkeley DB, the VACUUM command will compact and compress the database file(s). This operation takes more time than the dump/load approach of SQLite because it is doing more work to allow for the database to remain operational. We believe this is the right trade-off, but if you disagree you can always dump/load the database in your code.

Does SQLite re-use deleted space on a page in the same way that BDB does (using the Btree key to determine which page the data should be stored on)?

Yes, the btree implementations are fairly similar at this level. SQLite uses a different algorithm for keeping its tree balanced, but space from deleted rows on a page can be reused, and once all rows are deleted from a page, the page is put on a free list and can be recycled. In Berkeley DB, if it happens that some pages at the end of the file become free, the file is automatically truncated.

How is deadlock detection configured?

Automatic deadlock detection is enabled, it uses the default deadlock resolution algorithm. See the [" hre="http://download.oracle.com/docs/cd/E17076_02/html/api_reference/C/envset_lk_detect.html set_lk_detect documentation] for further information.

When are log files removed? What are the implications for hot backups and incremental backups?

By default, log files are automatically removed when all the information they contain has been synced to the database file. That means that incremental backups need special care in order to work. Automatic log file removal can be disabled at compile time by specifying the BDBSQL_OMIT_LOG_REMOVE preprocessor flag.

What tuning tools are available for use with Berkeley DB?

None specifically, but there are many resources available for performance tuning SQLite applications/databases.

Are there any special considerations when connecting custom processes to a SQL database (for example checkpoint, trickle, etc)?

Yes. If you intend to connect these processes at the same time as other access to the database, you should use the DB_REGISTER flag. This means that using the Berkeley DB utilities requires additional care because they generally don't set this flag.

Inside the database file, how are subdatabases named?

SQLite refers to tables with an ID number (in SQLite that's the root page of the btree for that table), whereas Berkeley DB refers to subdatabases by name. The mapping is simply:

sprintf(table_name, table, table_id);

The IDs are allocated so that tables ordinary tables always get an odd number and indices get an even number. You can find the ID for a SQL table called t1 with the following query:

select rootpage from sqlite_master where name=t1;

This can be used, for example, to get statistics about a particular table using the db_stat utility with the -s flag.

Can I use a `DB_CONFIG` file to override default settings?

Yes, create the journal directory and DB_CONFIG file before creating the database. Alternative, when the database file is not in use, create a DB_CONFIG file and run the db_recover utility manually. Note that before running any Berkeley DB utility, the DB_CONFIG file for SQL databases should contain the line:

add_data_dir ..

Is there anything special about temporary databases beyond the fact that they are not persistent? What about `:memory:` databases? Do they use the Berkeley DB cache internally?

Temporary tables don't live in a file on disk, and they have in-memory logs that allow transactions and savepoints to work as expected from SQL. There is no durability, so no recovery after a crash. Other than that, no, there is nothing special. A :memory: database consists entirely of temporary tables. They use an internal cache, allocated when the first temporary table is created.

Does the SQLite C++ API use exceptions?

The SQLite C++ API returns error codes, rather than using exception handling.

How can I deal with `SQLITE_BUSY` easily?

You can register a busy callback handler by SQLite's API:

int sqlite3_busy_handler(sqlite3*, int(*)(void*,int), void*);

That makes it easy to deal with SQLITE_BUSY event automatically. Otherwise, you have to check the return code and retry manually. Alternatively, you can use:

int sqlite3_busy_timeout(sqlite3*, int ms);

to set the timeout threshold. How can I iterate through a result set?

If you are using one-step query execution interface( sqlite_exec()), you can setup a callback function for the 3rd argument of sqlite_exec() to iterate result.
If you are using prepared query execution, you can use sqlite3_column_*() functions to fetch the result of current statement step.
Use sqlite3_get_table() to fetch results into memory data structure directly, but that is subject to the size of result set and available memory of system.

What the difference between `sqlite3_prepare()`, `sqlite3_prepare_v2()`, `sqlite3_prepare16()` and `sqlite3_prepare16_v2()`?

The sqlite3_prepare() and sqlite3_prepare_v2() interfaces use UTF-8, and sqlite3_prepare16() and sqlite3_prepare16_v2() use UTF-16. The sqlite3_prepare_v2() and sqlite3_prepare16_v2() interfaces are recommended for all new programs. The two older interfaces are retained for backwards compatibility, but their use is discouraged. More details are in the SQLite docs on prepare.

Can the users not specify primary key when create a table in Berkeley DB 11gR2?

Yes, when you create a table in Berkeley DB 11gR2, you do not need to specify a primary key. In this case, Berkeley DB 11gR2 will create an implicit integer row ID column as the primary key, so if you specify a non-integer primary key, there will be a second table that maps the primary key to the row ID. There are two Berkeley DB tables in that case:

the primary table containing the complete row, with integer keys (row IDs) that are managed by SQLite;
the index that maps application key -> row ID.

Are there any profiling tools?

You can try one of following ways:

Place explain before SQL statements to get the query plan(VDBE format).
Use method sqlite_analyzer().
Use tracing and profiling functions(sqlite3_trace() and sqlite3_profile()). More details for SQLite C Interface.
Use .timer command of sqlite3 shell.
Use dtrace on Solaris or Mac OS/X

How can I know how many tables are there in a Berkeley DB 11gR2 databases, and what's the description for each table?

In the SQL command line mode, command select * from sqlite_master; will list these information. Please refer to the SQLite tutorial for more supported SQL statements.

Does Berkeley DB support JDBC?

Yes, Berkeley DB includes an open source JDBC driver.

What are the limitations of the supported JDBC driver?

Currently, the JDBC driver doesn't support below properties:

prepareCall
savePoint(setSavepoint/rollback/releaseSavepoint)
prepareStatement() is conditionally supported only when the resultSetType is one of FORWARD_ONLY or SCROLL_INSENSITIVE or SCROLL_SENSITIVE and the resultSetConcurrency is one of CONCUR_REQD_ONLY or CONCUR_UPDATABLE.
getParameterMetaData

Does Berkeley DB support ODBC?

Yes, Berkeley DB includes an open source ODBC driver.

Is it possible to interrupt one search while another search is going on?

Yes, this is supported by sqliteodbc using sqlite3_interrupt.

Does Berkeley DB support cancellation of SQL?

Yes, supported by sqliteodbc using sqlite3_interrupt.

Which version of Windows Mobile does BDB run on?

The Visual Studio 2008 project files will by default build for Windows Mobile 6.5.3 Professional. If you want to build for other platforms such as Windows Mobile 6.0, 6.1, or 6.5, you need to follow the steps in the Windows CE build documentation.

How can I enable transaction snapshots in the SQL API?

Transaction snapshots are not supported in 5.0 or 5.1. Beginning in 5.2, transaction snapshots can be enabled using the pragmas PRAGMA multiversion=on|off and PRAGMA snapshot_isolation=on|off. Using DB_CONFIG to set DB_MULTIVERSION and DB_TXN_SNAPSHOT will not work correctly in any version of the SQL API.

What happens when I turn off replication?

When you use the following pragma to turn off replication:

pragma replication=OFF;

its effect is delayed until all database connections have been disconnected and Berkeley DB recovery has been run on the database. Recovery should be run using the Berkeley DB db_recover utility with the following command:

db_recover -h <database name>-journal

Recovery is run on the journal directory that Berkeley DB creates alongside your database. For example, the database test.db would have the journal directory test.db-journal.

Please note that any changes you make at a site where replication is turned off are likely to disappear if you turn replication back on. The effect of turning replication back on is also delayed until all database connections have been disconnected and you reopen the database. Upon reopening the database, the site synchronizes with the rest of the replication group.

Why do I see performance regression for synchronous commit operations on ext4 file system with the latest Linux Kernel?

This is not a bug of BDB. It's caused by a fix in the ext4 fsync code with this patch from 2.6.32. This fix is required for safe behavior with volatile write caches. The high performance number before is fictitious. With 2.6.32 released with ext4 defaulting to proper behavior on fsync(),we might finally be able to use regular drives with their caches turned on safely, taking advantage of the cache for other writes while doing the right thing with the database writes.

Why do I see performance regression when going from RedHat Enterprise Linux version 4 to version 5?

In RHEL 5, a new kernel parameter was added that specifies whether or not memory-mapped file pages should be flushed to disk by update while the memory map is active. Be default BDB uses memory-mapped files for all its environment files. You may need to set flush_mmap_pages to zero to see the behavior that you saw in RHEL 4.

Why does available memory of my Windows x64 OS get exhausted when doing a lot of database put operations?

When a user does a large number of database put operations in a short time on certain Windows x64 OS platforms (notably Vista, Windows Server 2003, and Windows Server 2008), the working set size of the system file cache will grow to meet this demand. Over time, the available memory can be exhausted, and the system can run painfully slow. This is a bug in the Windows OS, and not in BDB. (For reference, including a link to a list of affected platforms, see knowledge_note). As a workaround, you can turn off buffered I/O in Windows by specifying the DB_DIRECT_DB flag to the environment. This is the equivalent of calling CreateFile with specifying FILE_FLAG_NO_BUFFERING and causes all I/O to go straight to the disk instead of memory. A second option is to use the SetSystemFileCacheSize API provided by Windows to restrict the consumption of physical memory.