High Throughput Message Processing

Priority message handling overview

Devices such as network routers and firewall appliances process massive volumes of data. Typically, data transmission from source to destination is accomplished via a multi-stage network. At each stage, the router receives data packets from a previous router and determines the next router to which the packet should be forwarded to in order to get the packet to the final destination. The router also has to address priority requirements (e.g. voice packets may have priority over data packets) and maintain state information (e.g. availability, congestion level, etc.) about neighboring routers in order to deliver data reliably and efficiently.

Similarly, a device such as a firewall appliance receives data packets, looks up a database of safe and unsafe sites, and forwards the packet if it is from a safe site, or discards it (or provides a warning) if the packet is from an unsafe site.

Data processing requirements

All these usage scenarios have the following general pattern of operation. As each data item is received, it is written to memory in an incoming queue (or appropriate data structure). Then the packet header is examined, various reference repositories are queried in order to determine the destination, priority or disposition of the data, and the data item is added to the appropriate outbound queue.

The astute reader will immediately notice that this is fundamentally a database application involving multiple tables for incoming and outgoing data items as well as tables for reference information like routing, priorities, safe and unsafe sites. A database system can also satisfy other important requirements of such systems. Very high throughput and low latency are obvious requirements. For example, a router might process tens of thousands of packets per second. The time required for processing a packet should be extremely low (milliseconds) in order to ensure quality of service guarantees. Obviously, the data processing system should be highly available and self-managing; it is not feasible to have a human operator or administrator manage these devices.

How Berkeley DB addresses these needs

Berkeley DB is an ideal database system for addressing these requirements. Berkeley DB is a lightweight, embeddable database with APIs for data manipulation as well as database administration. The embedding application can use these APIs to perform data manipulation as well as maintenance operations such as checkpoints, backups, and recovery. In short, data manipulation and database management is completely encapsulated in the application.

Berkeley DB supports concurrent data manipulation because of its sophisticated fine-grained locking and multi-version concurrency features. In addition, Berkeley DB also provides ACID transaction semantics for high-throughput reliable operation of the application even under failure scenarios like thread, process, and disk failures.

Depending on application requirements, Berkeley DB provides a wide range of indexing options including b-tree indexing, hash indexing, queue, recno and heap access methods to support low latency lookups. A b-tree index supports exact match lookups as well as range retrievals (e.g. find records where the search key is between ‘a’ and ‘m’), whereas a hash index supports very fast exact-match lookups. Queue, recno and heap are specialized access methods designed to address specific requirements. As a general statement, b-tree indexing satisfies the vast majority of application requirements. Berkeley DB has been designed to scale over multiple processors, which is a very common hardware architecture in modern systems.

Berkeley DB also supports replication-based high availability by automatically maintaining multiple copies of the database on separate machines with independent modes of failure. In the event of a machine failure, Berkeley DB supports transparent failover to a surviving machine, ensuring uninterrupted operation and high quality-of-service guarantees.

Berkeley DB value proposition

Berkeley DB is a mature, robust database solution that has been used in devices such as routers and firewall appliances for several years, by large, well established, and happy customers. Arguably, it is the most widely used embeddable database today. Customers appreciate the reliability, performance, ease of use and flexibility of Berkeley DB as well as the ability to get commercial support for their applications. Berkeley DB is a solution you can rely on, for a wide variety of high performance, high availability, enterprise database applications.

Speed, reliability and ease-of-use are important considerations for telecommunication infrastructure equipment. We’re pleased that Oracle continues to improve Berkeley DB for the telecommunications industry. Openwave relies on Berkeley DB as the core message store in Email Mx, a messaging platform that handles more than 1.5 billion messages a day.” Rich Wong, SVP Products and Solutions Group, OpenWave.

The following two sample applications illustrate many of the relevant features of Berkeley DB discussed in this paper.

Sample firewall application: www.oracle.com/technetwork/database/berkeleydb/learnmore/bdbfirewallexample-2488322.zip

Sample priority message handling application: