AT ORACLE: Interview
Big Data ManagementBy Rich Schwerin
Oracle NoSQL Database facilitates efficient storage of massive amounts of data in a simple, flexible format.
Following the announcement of the availability of Oracle NoSQL Database, Rich Schwerin, Oracle Magazine contributor, sat down with Dave Segleau, director of product management at Oracle, to talk about the new offering for big data management. The following is an excerpt from that interview. Download the full podcast at oracle.com/magcasts.
Oracle Magazine: Let’s start at the beginning. What is a NoSQL database?
Segleau: NoSQL means not only SQL, and it encompasses a set of database technologies that have been under development for the past 12 years. NoSQL databases in general try to address some of the data management requirements of what’s been called big data in the industry. In very general terms, a NoSQL database is a nonrelational database that can manage data over a distributed set of storage servers, is designed to be highly available and highly scalable, and supports a variable data schema and data formats. NoSQL databases often avoid ACID [atomic, consistent, isolated, and durable] transactions and table joins in order to achieve faster throughput. There are several different kinds of NoSQL databases, and each implementation tends to have its own particular set of technical features and behavior. The tough part about defining what a NoSQL database is, is that there are no standards for NoSQL today. There are literally hundreds of products claiming to be NoSQL databases or having NoSQL capabilities.
Oracle Magazine: When would a developer choose a NoSQL database?
Segleau: The most common use cases involve Web or internet-centric applications—what we like to call Web-scale applications or Web services, in the broadest sense. These applications are providing either data capture or data services over the Web.
Oracle Magazine: What are some of the pros and cons associated with NoSQL databases?
Segleau: The pros include the ability to scale out compute and storage capacity horizontally over a wide range of hardware resources, simple and fast queries, and a flexible and simple approach to schema management. The cons include a lack of support for complex queries, a lack of support for multitable joins, limited transaction support, and having to learn a new database technology approach.
Oracle Magazine: You mentioned several different kinds of NoSQL databases. What kind of database is Oracle NoSQL Database?
Segleau: Oracle NoSQL Database is a distributed key-value database, like the ones currently used at LinkedIn and Amazon.com. The key might be the user or membership ID, and the value contains some information about that user—for example, basic profile information including address, picture, and other vital information. Other records associated with that key might contain the user IDs or e-mail addresses of friends and the products that the user has recently purchased.
If you’re an RDBMS person, you can think of a key-value database as the simplest form of a two-column relational table: the first column is the key, and the second column is the value. Keys and values can be very simple values or complex structures. Oracle NoSQL Database stores records that contain a key-value pair and retrieves records based on the requested key. Oracle NoSQL Database distributes those key-value records, based on the hashed value of the key, across any number of servers that we call storage nodes. The database is designed to scale out to many systems as your data management needs grow and provides many of the features common to other NoSQL database implementations, as well as providing several key features that are not available in other NoSQL products.
Oracle Magazine: What are the some of those key features?
Segleau: There are several key features that I’d like to highlight, but what it boils down to is that Oracle NoSQL Database is general purpose, as well as simple to use and deploy. Lots of the existing NoSQL products are specially tuned for specific kinds of problems. The issue is that this approach doesn’t adapt well to other types of problems. For example, Dynamo—Amazon’s NoSQL database—is very good for Amazon’s requirements because Amazon wrote it. But most customers are not Amazon, and what they want is a more general-purpose solution that will address their NoSQL database needs.
A common complaint is that many of the existing NoSQL products discard fundamental database technology, such as transactions, in order to run fast, pushing those fundamental requirements onto the application developer. With Oracle NoSQL Database, that functionality remains in the database where it belongs. Quite frankly, we heard from several existing NoSQL users that concepts such as high throughput without transactions and eventual consistency were interesting theoretical models, but that they made application development a nightmare. Hence, Oracle NoSQL Database has flexible, ACID transactions.
Oracle Magazine: How does Oracle NoSQL Database fit into Oracle’s big data strategy?
Oracle Magazine: How do you organize and analyze the data that Oracle NoSQL Database acquires?
Segleau: Oracle NoSQL Database stores the distributed key-value pairs in storage nodes across a wide set of systems. For simple statistics, especially things like counts and other scalar aggregates, you can use Hadoop MapReduce processes to quickly generate statistics that are useful to the application. For deeper, more-complex analysis, typically you’ll want to move the data of interest into an Oracle data warehouse and then use the rich set of tools and processes, including Oracle R Enterprise, that are available there to generate more-complex, multifaceted results. There are a variety of ways of moving data from Oracle NoSQL Database into an Oracle data warehouse, including Oracle Data Integrator, Hadoop MapReduce processes, and even in-database MapReduce that can pull NoSQL data directly into a query in the data warehouse. In other words, through the use of SQL functions—in this case, a MapReduce SQL function—you can have data directly extracted from an external source accessible to MapReduce into a query that’s running in the Oracle database.
Rich Schwerin is a senior manager with Oracle Publishing who focuses on social media.
Send us your comments