Guide to Oracle Berkeley DB for SQL Developers


By Margo Seltzer
 

Get an introduction to using common SQL functions in Oracle Berkeley DB.

Published September 2007

The Oracle Berkeley DB team is frequently asked, "How do I do <SQL query here> in Berkeley DB?" So, this is an introduction to implementing much of your favorite SQL functionality in Oracle Berkeley DB. Not all SQL applications should be implemented in Oracle Berkeley DB--an open source, embeddable database engine that provides fast, reliable, local persistence with zero administration required--but if you have a relatively fixed set of queries and you care about performance, Berkeley DB may just be the way to go.

Let's start at the very beginning (a very good place to start). When you read you begin with ABC; in Berkeley DB you begin with terminology.

Here is a small "translation guide" for diehard SQL programmers:

SQL Term Oracle Berkeley DB Equivalent
Database Environment
Table Database
Tuple/row Key/data pair
Primary index Key
Secondary index Secondary database


Let's pick an application domain—the traditional employee database, but somewhat simplified. Furthermore, we'll assume that you want all of Berkeley DB's bells and whistles: concurrency, transactions, recoverability, and so on.

Creating Databases

In SQL, you say

CREATE DATABASE personnel


In Berkeley DB, you want to create the environment in which you'll place all your application data. Throughout your code, you'll refer to that environment via an environment handle, whose type is DB_ENV. And you'll use that handle to operate upon the environment. For now, we'll ignore any fancy error handling and strictly focus on the APIs.

        DB_ENV *dbenv;
        int ret;
        
        /* Create the handle. */
        DB_ASSERT(db_env_create(&dbenv, 0) == 0);
        
        /*
         * If you wanted to configure the environment, you would do that here.
         * Configuraition might include things like setting a cache size,
         * specifying error handling functions, specifying (different)
         * directories in which to place your log and/or data files, setting
         * parameters to describe how many locks you'd need, etc.
         */
        
        /* Now, open the handle. */
        DB_ASSERT(dbenv->open(dbenv, "my_databases/personnel", 
                DB_CREATE | DB_INIT_LOCK | DB_INIT_MPOOL | DB_INIT_TXN | DB_THREAD, 0644);


You've now created and opened an environment. There are a few things to note:

  • The directory my_databases/personnel must exist before you do this.
  • The final parameter to the open call is the mode of the files that we will create for you as part of this environment.
  • The flags specified here have allowed you to create the environment
    (DB_CREATE), use locking (DB_INIT_LOCK); have a shared memory buffer pool (DB_INIT_MPOOL); use transactions (DB_INIT_TXN); and use the resulting environment handle in different threads of control simultaneously (DB_THREAD).

In SQL, queries are typically processed by a separate server, and that server is configured by a database administrator to work well (or not) on your systems. As Berkeley DB is embedded in your application, your application may perform much of this configuration. However, that's really about database tuning, and we'll leave that for a separate article.

Now that you've created a database, it's time to create some tables. In Berkeley DB, tables are referenced by handles of type DB *. For each table in your application, you'll typically open one handle and then use that handle in one or more threads.

So, in SQL you might say

CREATE TABLE employee
        (primary key empid int(8), last_name varchar(20), first_name varchar(15),
        salary numeric(10, 2) salary, street varchar (20), city varchar (15),
        state char(2), zip int(5))
       

Before we look at the Berkeley DB code to implement this, it's important to remember that in SQL, the database is responsible for implementing and interpreting the schema of your data. In Berkeley DB, this interpretation is left up to the application. This will become more interesting when we examine the data manipulation language (DML), but for now, it will be apparent, because in creating the employee table, Berkeley DB will know only about the primary key and not about the different fields in the database.

First, you need to create a database handle to represent the table that you're creating. (Again, we're skipping error handling.)

        DB *dbp;
        DB_ENV *dbenv;
        
        
        /* Let's assume we've used the code from above to set dbenv. */
        ASSERT(db_create(&dbp, dbenv, 0) == 0);
        
        /*
         * Like with the environment, tables can also be configured. You
         * can specify things like comparison functions, page-size, etc.
         * That would all go here.
         */
        
        /* Now, we'll actually open/create the primary table. */
        
        ASSERT(dbp->open(dbp, NULL, "employee.db", NULL, DB_BTREE,
                DB_AUTO_COMMIT | DB_CREATE | DB_THREAD, 0644) == 0).
       

This call creates the table, using a B-tree as the primary index structure. The table will be materialized in the directory my_databases/personnel with the name employee.db. That file will contain only a single table and will have file system permissions as specified by the final parameter (0644). The flags that we've specified create the table in a transaction, allowing future transactional operations
(DB_AUTO_COMMIT); allow creation of the table if it doesn't exist (DB_CREATE); and specify that the resulting handle can be used by multiple threads of control simultaneously (DB_THREAD).

Notice that you haven't specified exactly what comprises a primary key (index) or what the data fields look like that are stored in this table. That will all fall on the application and will become more apparent when we get to the sections on insert, select, and update.

Now let's consider what would happen had you wanted both a primary index on the employee id and a secondary index on the last name.

You'd use the SQL query specified above and then issue

CREATE INDEX lname ON  employee (last_name)

In Berkeley DB, secondary indexes look just like tables. You can then associate tables to make one a secondary index of the other. In order to implement this functionality, you'll need to dive a bit more deeply into the data representation that your application is going to use.

Let's assume that your application is going to use a C structure to contain the tuples in our employee table. You might define that structure as shown below:

typedef struct _emp_data {
        char    lname[20];
        char    fname[15];
        float   salary;
        char    street[20];
        char    city[15];
        char    state[2];
        int     zip;
} emp_data;


And let's say that the employee ID is a simple integer:

typedef int emp_key;

In Berkeley DB, when you manipulate keys or data items, you use a structure called a DBT. A DBT encapsulates an opaque byte-string, representing it as a pointer and a length. The pointer is referenced by the data field of a DBT, and the length is stored in the size field of the DBT. If you wanted to manipulate the key/data pair representing an employee, you would use one DBT for the emp_key and another for the emp_data.

        DBT     key_dbt, data_dbt;
        emp_key ekey;
        emp_data        edata;
        
        memset(&key_dbt, 0, sizeof(key_dbt));
        memset(&data_dbt, 0, sizeof(data_dbt));
        
        /*
         * Now make the key and data DBT's reference the key and data
         * variables.
         */
        key_dbt.data = &ekey;
        key_dbt.size = sizeof(ekey);
        
        data_dbt.data = &edata;
        data_dbt.size = sizeof(edata);
        
       

The main observation here is that a tuple in SQL is represented by a key/data pair, but the application is responsible for understanding how to interpret these pairs.

With that as background, let's return to our discussion of secondary indexes. Since Berkeley DB does not understand the structure or schema of the data element in a key/data pair, it is going to need assistance from the application to identify the fields that we use as secondary indexes. This assistance is provided by the application by way of callback functions. The callback function takes a key/data pair and returns a DBT that references the value to be used as a secondary key.

So, in order to create the secondary index on last_name, you must write a callback function that takes a key/data pair and returns a DBT referencing the last_name field of that data item.

int
lname_callback(DB *dbp, const DBT *key, const DBT *data, DBT *skey)
{
        emp_data        *edata;
        
        /*
         * We know that the opaque byte-string represented by the data DBT
         * represents one of our emp_data structures, so let's cast it
         * to one of those so that we can manipulate it.
         */
        
        edata = data->data;
        
        skey->data = edata->lname;
        skey->size = strlen((edata->lname);
        
        return (0);
}


Now that you've written your callback, you can specify a secondary index. Recall that a secondary index is simply a table—so let's start by creating a table:

        DB *sdbp;
        
        ASSERT(db_create(&sdbp, dbenv, 0) == 0);
        
        /* Configure sdbp. */
        
        ASSERT(sdbp->open(sdbp, NULL, "emp_lname.db", NULL, DB_BTREE,
                DB_AUTO_COMMIT | DB_CREATE | DB_THREAD, 0644) == 0);
       

Once again, you're using a B-tree structure to index lastnames, and we're keeping all the same flags and modes that you used before.

Finally, you must associate your secondary index table with your main table (the employee table). Recall that dbp is the handle to the employee table and sdbp is the handle to the secondary index table.

ASSERT(dbp->associate(dbp, NULL, sdbp, lname_callback, flags) == 0);
       

Things to note:

  • You can create as many secondary indexes as you like. The key issue is that secondaries slow up insertions (because you have to create an index entry for each secondary), but they greatly improve query performance if you use the secondary key value (for example, the lastname) to look up or select tuples.
  • As long as you have your secondary indexes open and associated whenever you update the main table, your secondaries will be completely up-to-date. However, if you forget to open and associate secondaries and then modify the base table, you may find that your secondaries are no longer up-to-date.
    This practice should be avoided.

The last two operations in the DDL are the drop commands: drop index, drop table, and drop database.

Just as you can drop indexes and delete tables in SQL, you can do the same in Berkeley DB. In SQL, you might say

DROP TABLE employee

or

DROP INDEX lname


Dropping a table in SQL drops all the indexes associated with it, but in Berkeley DB, you have to do that explicitly. Fortunately, dropping tables or indexes are identical operations in Berkeley DB.

Before removing a table, all database handles on that table must be closed. Closing a table is easy; assume that we're going to drop the secondary index on the employee database. Let's first close that secondary:

sdbp->close(sdbp, 0)
       

After issuing the close method on a database handle, the handle cannot be used again.

Now that you've closed the secondary index table, you can remove it using the dbremove method off of the dbenv handle:

DB_ENV *dbenv;
        
ASSERT(dbenv->dbremove(dbenv, NULL, "emp_lname.db", NULL, DB_AUTO_COMMIT) == 0);
       

The same sequence of calls (closing and dbremoving) can be used to drop tables as well.

But let's say that you don't want to drop a table; we just want to change its name. You can do that too.

As with remove, you must first close the table handles:

dbp->close(dbp, 0);
       

Now you can change the table's name:

DB_ENV *dbenv;
        
ASSERT(dbenv->dbrename(dbenv, NULL, "employee.db", NULL, "newemp.db", DB_AUTO_COMMIT) == 0);
       

Finally, you might wish to destroy a database. In SQL, you would execute

DROP DATABASE personnel


This command also has an analogy in Berkeley DB.

First, you have to close the environment.

ASSERT(dbenv->close(dbenv, 0) == 0);

As with closing table handles, once you close an environment handle, you can no longer use that handle. So, in order to drop the database, you'll need to create a new handle and then use that handle to remove the database (environment).

ASSERT(db_env_create(&dbenv, 0) == 0);
ASSERT(dbenv->remove(dbenv,  "my_databases/personnel", 0) == 0);
       
That wraps up our translation of SQL's DDL into Berkeley DB. Next, we'll explore how to translate SQL DML into Berkeley DB.


Performing SQL DML operations in Berkeley DB

Now that we've covered SQL's DDL and its implementation in Berkeley DB, now you'll begin adding data to databases, covering SQL's insert into, update, and delete.

In SQL, you add data to tables using the insert statement:

INSERT INTO employees VALUES  (00010002, "mouse", "mickey", 1000000.00, "Main Street", "Disney Land",  "CA", 98765);


SQL inserts all become Berkeley DB put methods off of database or cursor handles; we'll start with databases and get to cursors later.

Let's assume that you have your table opened from last time and we have a database handle dbp that references the employee table. Now, hire Mickey Mouse.
 
        DB *dbp;
        DBT key_dbt, data_dbt;
        emp_data edata;
        emp_key ekey;
        
        /* Put the value into the employee key. */
        ekey = 00010002;
        
        /* Initialize an emp_data structure. */
        strcpy(edata.lname, "Mouse");
        strcpy(edata.fname, "Mickey");
        edata.salary = 1000000.00;
        strcpy(edata.street, "Main Street");
        strcpy(edata.city, "Disney Land");
        strcpy(edata.state, "CA");
        edata.zip = 98765;
        
        /* Initialize DBTs */
        memset(&key_dbt, 0, sizeof(key_dbt));
        memset(&data_dbt, 0, sizeof(data_dbt));
        
        /* Now, assign key and data values to DBTs. */
        key->data = &ekey;
        key->size = sizeof(ekey);
        data->data = &edata;
        data->size = sizeof(edata);
        
        /* Finally, put the data into the database. */
        ASSERT(dbp->put(dbp, NULL, &key_dbt, &data_dbt, DB_AUTO_COMMIT) == 0);
       

Note that if you had associated any secondaries with the employee table, as in SQL, they would have been updated automatically when you did the insert.

Now, let's say you have some data in your tables, and you want to change it. For example, let's give Mickey a raise! There are a couple of ways we can do this.

The first method is identical to the insert code above—if you issue a put method on a table and the key already exists (and the table does not allow duplicate data values for a single key), then the put will replace the old version with the new. So, the following sequence will replace Mickey's record with one that gives Mickey a $2,000,000 salary instead of a $1,000,000 salary.
     
        /* Put the value into the employee key. */
        ekey = 00010002;
        
        /* Initialize an emp_data structure. */
        strcpy(edata.lname, "Mouse");
        strcpy(edata.fname, "Mickey");
        edata.salary = 2000000.00;
        strcpy(edata.street, "Main Street");
        strcpy(edata.city, "Disney Land");
        strcpy(edata.state, "CA");
        edata.zip = 98765;
        
        /* Initialize DBTs */
        memset(&key_dbt, 0, sizeof(key_dbt));
        memset(&data_dbt, 0, sizeof(data_dbt));
        
        /* Now, assign key and data values to DBTs. */
        key->data = &ekey;
        key->size = sizeof(ekey);
        data->data = &edata;
        data->size = sizeof(edata);
        
        /* Finally, put the data into the database. */
        ASSERT(dbp->put(dbp, NULL, &key_dbt, &data_dbt, DB_AUTO_COMMIT) == 0);
       

Note that this approach is cumbersome—in order to do this, you would have to know the value of all the other fields in the database. So, unlike

UPDATE employees  SET salary = 2000000 WHERE empid = 000100002

where you need only the employee ID, now you need everything. Isn't there a way to do this in Berkeley DB? The answer is yes. If you know exactly which bytes of a data item you wish to replace, you can do the equivalent of the update command.

In order to do this, you'll need to introduce the notion of a cursor. A cursor represents a position in a table. It lets you iterate over the table and maintain the notion of a current item that you can then
manipulate.

Creating a cursor in Berkeley DB is easy—it's a method off of a database handle:

        DBC     *dbc;
        DB      *dbp;
        
        ASSERT(dbp->cursor(dbp, NULL, 0) == 0);
       

Now that you have a cursor, we want to position it on Mickey's record so you can update it. This is equivalent to the WHERE part of the SQL statement.

        DBT     key_dbt, data_dbt;
        emp_data        *edata;
        emp_key         ekey;
        
        /* We'd like to look up Mickey's key. */
        emp_key = 0010002;
        memset(&key_dbt, 0, sizeof(key_dbt));
        key_dbt.data = &emp_key;
        key_dbt.size = sizeof(emp_key);
        
        /*
         * We want the data returned, so we don't need to initialize the
         * employee data data structure.
         */
        memset(&data_dbt, 0, sizeof(data_dbt));
        
        /* Now, set the cursor to the record with the key emp_key. */
        dbc->c_get(dbc, &key_dbt, &data_dbt, DB_SET);
       

Next we can change the salary (handle the "SET salary=2000000" part of the clause)

        /* Change the salary. */
        edata = data_dbt->data;
        edata.salary = 2000000;
       

Finally, apply the UPDATE portion of the SQL statement:

        dbc->c_put(dbc, &key_dbt, &data_dbt, DB_CURRENT);
       

In this case, you did not know the contents of Mickey's record a priori, so you retrieved it and then updated it.

Alternatively, you needn't even retrieve the record. The DB_DBT_PARTIAL flag value on DBTs indicates that you are getting/putting only part of a record, so that Berkeley DB can ignore everything except that part.

Try it again:

        emp_data edata;
        float salary;
        
        /* We'd like to look up Mickey's key. */
        emp_key = 0010002;
        memset(&key_dbt, 0, sizeof(key_dbt));
        key_dbt.data = &emp_key;
        key_dbt.size = sizeof(emp_key);
       

Instead of retrieving the entire record, don't retrieve anything—that is, perform a PARTIAL get, specifying that you only want 0 bytes of the data item.

        /* We don't want the data, we just want to position the cursor. */
        memset(&data_dbt, 0, sizeof(data_dbt));
        data_dbt->flags = DB_DBT_PARTIAL;
        data_dbt->dlen = 0;

        /* Position the cursor on Mickey's record */
        dbc->c_get(dbc, &key_dbt, &data_dbt, DB_SET);

        /*
         * Now, prepare for a partial put. Note that the DBT has already
         * been initialized for partial operations.  We need to specify
         * where in the data item we wish to place the new bytes and
         * how many bytes we'd like to replace.
         */
        salary = 2000000.00;

        /* The DBT contains just the salary information. */
        data_dbt->data = &salary;
        data_dbt->size = sizeof(salary);

        /*
         * dlen and doff tell Berkeley DB where to place this information
         * in the record.  dlen indicates how many bytes we are replacing --
         * in this case we're replacing the length of the salary field in
         * the structure (sizeof(emp_data.salary)).  doff indicates where
         * in the data record we will place these new bytes -- we need to
         * compute the offset of the salary field.
         */
        data_dbt->dlen = sizeof(emp_data.salary);
        data_dbt->doff = ((char *)&edata.salary - (char *)&edata);

        /* Now, put the record back with the new data. */
        dbc->c_put(dbc, &key_dbt, &data_dbt, DB_CURRENT);


Data Retrieval

Once you know how to put data in your tables, it's time to learn how to retrieve it. Let's start with the simplest approach: looking up values by their primary key .

        SELECT * FROM employees   WHERE id=0010002

You've already seen how to do this using a cursor:

        DBT             key_dbt, data_dbt;
        emp_data        *edata;
        emp_key ekey;

        /* We'd like to look up Mickey's key. */
        emp_key = 0010002;
        memset(&key_dbt, 0, sizeof(key_dbt));
        key_dbt.data = &emp_key;
        key_dbt.size = sizeof(emp_key);

        /*
         * We want the data returned, so we don't need to initialize the
         * employee data data structure.
         */
        memset(&data_dbt, 0, sizeof(data_dbt));

        /* Now, set the cursor to the record with the key emp_key. */
        dbc->c_get(dbc, &key_dbt, &data_dbt, DB_SET);


You used a cursor operation above because we wanted to then update the record. Let's say that all you want to do is retrieve the record; then you don't even need a cursor. All you need to do is use the get method off of the dbp handle:

        DBT             key_dbt, data_dbt;
        emp_data        *edata;
        emp_key ekey;

        /* We'd like to look up Mickey's key. */
        emp_key = 0010002;
        memset(&key_dbt, 0, sizeof(key_dbt));
        key_dbt.data = &emp_key;
        key_dbt.size = sizeof(emp_key);

        /*
         * We want the data returned, so we don't need to initialize the
         * employee data data structure.
         */
        memset(&data_dbt, 0, sizeof(data_dbt));

        /* Now, use the dbp method. */
        dbp->get(dbp, NULL, &key_dbt, &data_dbt, 0);


So, this is also identical to the SELECT expression above.

So far, you've always looked up a record by its primary key. But what if you don't know its key? Here are a few ways to find out:

  • Look up a record by its secondary key value.
  • Iterate over a set of items sharing a key.
  • Iterate over the database.

Let's examine each of these in more detail.

Using the Secondary Key

As in SQL, retrieving by a secondary key is remarkably similar to retrieving by a primary key.

In fact, the SQL query looks identical except for its where clause:

SELECT * FROM employees  WHERE last_name = "Mouse"


The Berkeley DB call will look similar to its primary equivalent.

Rather than use dbp, as shown in the primary key example, use the sdbp to look something up by its secondary key:

        DBT     key_dbt, data_dbt;
        emp_data        *edata;

        /* We'd like to look up by Mickey's last name. */
        memset(&key_dbt, 0, sizeof(key_dbt));
        key_dbt.data = "Mouse";
        key_dbt.size = strlen((char *)key_dbt.data);

        /*
         * We want the data returned, so we don't need to initialize the
         * employee data data structure.
         */
        memset(&data_dbt, 0, sizeof(data_dbt));

        /* Now, call the get method. */
        sdbp->get(sdbp, NULL, &key_dbt, &data_dbt, 0);


The interesting thing here is to know what gets returned in the data_dbt. It is the data in the primary database—that is, you get the exact same thing returned in the data DBT regardless of whether you look up the item by its primary or its secondary key.

However, you might notice that when you look up by secondary, the result is not quite the same as either retrieval by primary or the results of the SQL statement. What's missing is the primary key, because there isn't any place to return it. So, in fact, the code above actually implements

        SELECT last_name, first_name, salary, street, city, state, zip FROM
        employees  WHERE  last_name="Mouse"
       

What if you need the primary key? The answer is that you use the dbp->pget or dbc->pget method. These are identical to the get methods except they are designed for secondary index queries when you want the primary key returned. So, in this case, the result includes the primary key, the secondary key, and the data element:

        DBT     key_dbt, pkey_dbt, data_dbt;
        emp_data        *edata;

        /* We'd like to look up by Mickey's last name. */
        memset(&key_dbt, 0, sizeof(key_dbt));
        key_dbt.data = "Mouse";
        key_dbt.size = strlen((char *)key_dbt.data);

        /* Set up the dbt into which to return the primary. */
        memset(&pkey_dbt, 0, sizeof(pkey_dbt));

        /*
         * We want the data returned, so we don't need to initialize the
         * employee data data structure.
         */
        memset(&data_dbt, 0, sizeof(data_dbt));

        /* Now, get the record and the primary key. */

        sdbp->pget(sdbp, NULL, &key_dbt, &pkey_dbt, &data_dbt, 0);

This code is now equivalent to our SQL secondary selection:
 
       SELECT * FROM employees  WHERE last_name="Mouse"
       

Iteration Over Multiple Records

So far, you've returned only a single record. SQL lets you return multiple records (in other words, all employees with lastname Mouse). How might you do that in Berkeley DB?

Let's consider two cases. In the first case, you'll look up a set of items by their key. In the second, you'll search the database looking for items by a non-keyed field.

Let's say that you want to look up all your employees with the last name of Mouse (and we suspect that there might be many of them). This means that the secondary index on last_name would have been created allowing duplicates. Before opening the database, you would configure it for duplicate support:

        sdbp->set_flags(sdbp, DB_DUP);
        
        ASSERT(sdbp->open(sdbp, NULL, "emp_lname.db", NULL, DB_BTREE,
                DB_AUTO_COMMIT | DB_CREATE | DB_THREAD, 0644) == 0);
       

Now, when you retrieve by this secondary, you probably want to use a cursor to do it. You would begin with the same code that you used before, but you would add a loop to iterate over all the items that share the same secondary key:

        DBT     key_dbt, data_dbt;
        DBC     *sdc;
        emp_data        *edata;

        /* We'd like to look up by Mickey's last name. */
        memset(&key_dbt, 0, sizeof(key_dbt));
        key_dbt.data = "Mouse";
        key_dbt.size = strlen((char *)key_dbt.data);

        /*
         * We want the data and primary key returned, so we need only
         * initialize the DBTs for them to be returned.
         */
        memset(&data_dbt, 0, sizeof(data_dbt));
        memset(&pkey_dbt, 0, sizeof(pkey_dbt));

        /* Now, create a cursor. */
        sdbp->cursor(sdbp, NULL, &sdbc, 0);

        /* Now loop over all items with the specified key. */
        for (ret = sdbc->pget(sdbc, &key_dbt, &pkey_dbt, &data_dbt, DB_SET);
            ret == 0:
            ret = sdbc->pget(sdbc, &key_dbt, &pkey_dbt, &data_dbt, DB_NEXT_DUP) {

            /* Do per-record processing in here. */
        }


You initialize the cursor by asking it to find the first item with the specified key, and then you iterate over all the items in the database with the same key.

Another possible form of keyed iteration comes in the form of queries such as

SELECT * FROM employees  WHERE id >= 1000000 AND id < 2000000


Once again, you'll use a cursor to iterate, but this time, you'll want to establish a starting and stopping point. Berkeley DB makes the starting point easy; the stopping point is left up to the application.

        DBT     key_dbt, data_dbt;
        DBC     *dc;
        emp_key         ekey;

        /* Set the starting point. */
        memset(&key_dbt, 0, sizeof(key_dbt));
        ekey = 1000000;
        key_dbt.data = &ekey;
        key_dbt.size = sizeof(ekey);
        key_dbt.flags = DB_DBT_USERMEM;
        key_dbt.ulen = sizeof(ekey);

        memset(&data_dbt, 0, sizeof(data_dbt));

        /* Now, create a cursor. */
        dbp->cursor(dbp, NULL, &dbc, 0);

        /* Now loop over items starting with the low key. */
        for (ret = dbc->get(dbc, &key_dbt, &data_dbt, DB_SET_RANGE);
            ret == 0:
            ret = dbc->get(dbc, &key_dbt, &data_dbt, DB_NEXT)) {

            /* Check if we are still in the range. */
            if (ekey >= 2000000)
                break;
        
            /* Do per-record processing in here. */
        }


The two main things to note are 1) that you begin the loop with the DB_SET_RANGE flag, which positions the cursor on the first item greater than or equal to the specified key, and 2) that the application must check for the end of the range inside the loop.

Also, note that you set the DB_DBT_USERMEM flag in the key_dbt, indicating that the keys retrieved should be placed in the memory specified by the user. This lets you use the ekey variable to examine the key.

Let's wrap up the select section with a query that returns one or more items whose evaluation criteria is not a keyed field. Consider

SELECT * FROM  employees  WHERE state=ca


Since there is no key on the state field, you have no choice but to iterate over the entire database. This translates into a simple cursor iteration loop.

        DBC *dbc;
        DBT key_dbt, data_dbt;
        emp_data        *edata;

        dbp->cursor(dbp, &key_dbt, &data_dbt, &dbc, 0);
        memset(&key_dbt, 0, sizeof(key_dbt));
        memset(&data_dbt, 0, sizeof(data_dbt));

        for (ret = dbc->get(dbc, &key_dbt, &data_dbt, DB_FIRST);
                ret == 0;
                ret = dbc->get(dbc, &key_dbt, &data_dbt, DB_NEXT)) {


                /* See if the state field is "ca". */
                edata = data_dbt->data;
                if (strcmp(edata->state, "ca") == 0)
                        /* Keep this record. */

        }


This may seem inefficient, but if you do not have any indices on a field, you have no other option, and, in fact, this is precisely what your SQL database is doing internally when you specify a query that matches on an unkeyed field.

Removing Data

You've learned how to insert and change data and how to retrieve it. The last thing we need to cover is how to remove data. There are fundamentally two different ways to delete tuples from a database: if you know a key for the item you wish to remove (and it's not one of a set of duplicates items for that key), then you can do a keyed delete. If you do not know the key, then you can iterate and use a cursor delete. Let's start with the simple case, firing Mickey Mouse.

DELETE FROM employees  WHERE id= 0010002

This will look a lot like a retrieve, but we'll use the del method.

        DBT key_dbt;
        emp_key ekey;

        ekey = 0010002;
        memset(&key_dbt, 0, sizeof(key_dbt));
        key_dbt.data = &ekey;
        key_dbt.size = sizeof(ekey);

        dbp->del(dbp, NULL, &key_dbt, 0);

Alternatively, you might want to fire all employees with the last name of Mouse. The same technique works for that, since you have a secondary index on last name:

DELETE FROM employees  WHERE last_name = "Mouse"

        DBT key_dbt;

        memset(&key_dbt, 0, sizeof(key_dbt));
        key_dbt.data = "Mouse";
        key_dbt.size = strlen(key_dbt.data);

        dbp->del(dbp, NULL, &key_dbt, 0);


But perhaps this is too harsh. Perhaps you didn't want to fire Mickey; you really only wanted to fire Minnie Mouse. Is there a way that we can easily fire Minnie? In other words, how do you do this:

DELETE FROM  employees  where last_name =  "Mouse" AND first_name = "Minnie"

Not surprisingly, you'll need a cursor to iterate over our data items and select the one you want to delete.

        DBT key_dbt, data_dbt;
        DBC *sdbc;

        sdbp->cursor(sdbp, NULL, &sdbc, 0);

        memset(&key_dbt, 0, sizeof(key_dbt));
        key_dbt.data = "Mouse";
        key_dbt.size = strlen(key_dbt.data);

        for (ret = sdbc->get(sdbc, &key_dbt, &data_dbt, DB_SET);
            ret == 0;
            ret = sdbc->get(sdbc, &key_dbt, &data_dbt, DB_NEXT_DUP)) {
                edata = data_dbt->data;
                if (strcmp(edata->first_name, "Minnie") == 0) {
                        /* OK, this is a record we want to delete. */
                        sdbc->del(sdbc, 0);
                }
        }

        dbp->del(dbp, NULL, &key_dbt, 0);


By this time, you should have an overview of how to write Oracle Berkeley DB functions to perform basic SQL commands. Berkeley DB also has a large number of options and configurations that provide more-complicated functionality. Now we'll cover just one more topic: enclosing database operations in transactions.

Managing Transactions

Let's review how transactions work in (most) SQL implementations. Whenever you issue a DML statement in SQL, it becomes part of the current transaction. Each subsequent statement also runs as part of that transaction. The current transaction commits either when the SQL session ends or when an application issues a COMMIT statement. At any point, a transaction may be aborted by issuing the ROLLBACK statement.

Many SQL implementations also include an AUTOCOMMIT feature, where every DML statement is treated as its own transaction. When AUTOCOMMIT mode is enabled, the sequence

        statement 1
        COMMIT
        statement 2
        COMMIT
        statement 3
        COMMIT

is identical to
        statement 1
        statement 2
        statement 3


Berkeley DB also lets you encapsulate database operations in transactions. Unlike SQL, you can also run Berkeley DB without transactions. In fact, unless you explicitly request transactions, you will be running without them. So, how do you tell Berkeley DB that you'd like to use transactions?

If you recall, there are flags that can be specified when you open an environment:

        DB_ASSERT(dbenv->open(dbenv,  "my_databases/personnel",
                DB_CREATE | DB_INIT_LOCK |  DB_INIT_MPOOL | DB_INIT_TXN | DB_THREAD,
                0644);


Those flags configure Berkeley DB for your application. In this case, transactions were enabled because the DB_INIT_TXN flag was specified. Had this flag been omitted, then the application would run without transactions.

Berkeley DB provides a feature analogous to SQL's AUTOCOMMIT. You can configure an entire database (environment) to always autocommit, using the set_flags method off of the environment handle:

        dbenv->set_flags(dbenv, DB_AUTOCOMMIT, 1);


Alternatively, you can specify DB_AUTOCOMMIT on a database open, causing all subsequent operations to which you don't explicitly pass a transaction to be run in a transaction.

But let's say that you don't want autocommit; you want your application to be able to group operations into a logical transaction. For example, let's say that you want to add Mickey Mouse and also assign him a manager.

INSERT INTO employees VALUES (00010002, "mouse", "mickey", 1000000.00,
        "Main Street", "Disney Land", "CA", 98765);

INSERT INTO manages(00000001, 000100002)
COMMIT


(The above indicates that the employee with id=00000001 will manage Mickey.)

We'll assume that you know how to perform the data operations, and we'll focus on how to specify the transaction.

First, you have to explicitly begin a transaction (unlike in SQL). Creating a transaction is an environment operation, so it's a method off of the environment handle. This method will create a transaction handle (DB_TXN).

        DB_TXN *txn;

        dbenv->txn_begin(dbenv, NULL, &txn, 0);


Now that you have a transaction handle, you can pass it to any database operation that you want to be part of the transaction:

        emp_dbp->put(emp_dbp, txn, &key, &data, 0);
        man_dbp->put(man_dbp, txn, &key, &data, 0);


Then, you can either commit or abort the transaction by calling the appropriate method off of the transaction handle.

To commit a transaction:

      txn->commit(txn, 0);


To abort the transaction:

      txn->abort(txn);


Both methods are destructors, rendering the transaction handle unusable.

Unlike SQL, Berkeley DB can transaction-protect DDL operations as well. Therefore, you can also pass DB_TXN handles to operations like dbenv->dbremove, dbenv->dbrename, and dbp->open(… DB_CREATE …). In each of these cases, the DDL operation will be performed in the context of the specified transaction, which means it can be committed and aborted, just like any other transaction.

Conclusion

Oracle Berkeley DB provides the same kinds of functionality that you find in a SQL database, however, it provides it in a very different package. You write programmatic code to call APIs and the entire database is "embedded" directly in your application; that is, they run in the same address space. This usually provides an order of magnitude performance improvement. Achieving this benefit places a greater burden on the application. This is typically most useful when an application demands extraordinarily high performance or the data manipulated by the application is not inherently relational.


Margo Seltzer was one of the original authors of Berkeley DB and co-founded Sleepycat Software. She is also the Herchel Smith Professor of Computer Science and a Harvard College Professor in the Harvard School of Engineering and Applied Sciences. Her research interests include file systems, databases, and transaction processing systems.