Oracle Big Data Essentials Online Forum - Chat Transcript (Raw)

From Feb. 16, 2012
 

Question Response
Big Data Appliance and Big Data Connectors
how is Big data appliancce different from exadata? This engineered system is designed for Hadoop and NoSQL workloads. More of this is covered in the Organize section. While it shares characteristics with Exadata - infiniband connectivity for example -- the Processors and Storage are balanced for Hadoop workloads, the software stack includes Cloudera's CDH and Management suite as well as other products used in Big Data Analysis.
Do you need to buy appliance hardware for big data solution? No. Oracle Big Data Connectors, Oracle R Enterprise, and Oracle NoSQL DB are available independent of the Big Data Appliance. Cloudera Software is also available from Cloudera directly independent of the BDA.
Is oracle 11g needed to analyze big data? I mean, is 11g only target for reduced data? Big Data Connectors Oracle Loader for Hadoop and Oracle Direct Connector for HDFS only support Oracle Database 11.2.0.2 and above as a target for the reduced data.
Can i use the oracle loader for hadoop w/o a big data appliance? re: to an existing hadoop cluster not running oracle software/hardware Yes, Oracle Loader for Hadoop and the other components of Oracle Big Data Connectors can be used without the Oracle Big Data Appliance.
during the analyze stage - do you "talk" to hadoop or it's only within RDBMS? The Oracle Big Data Connectors can be used to access data from Hadoop for analysis in the database.
you said that during data analysis i can work on RDBMS and hadoop data. I wonder how exactly the "communication" wtih data residing on hadoop occures? for eample, is iusea simple analytical SQL query - how can i include the hadoop's residing data? You can use Oracle Loader for Hadoop to load data from Hadoop into Oracle Database, and/or use Oracle Direct Connector for Hadoop Distributed File System (HDFS) to access data on HDFS via external tables, so that the data on HDFS can be queried using SQL in Oracle Database and imported into the database as necessary.
Is the Oracle Big Data Appliance available for download on OTN? Some of the software is (for example the Big Data Connectors). But not the appliance itself.
What's Oracle Big Data Appliance? An engineered system for Hadoop. More info at oracle.com/bigdata and click on the big data appliance in the organize section.
What SW installed on the Big Data Appliance is available stand-alone/without Appliance HW? Oracle Linux, Java, NoSQL Database, Oracle Big Data Connectors. The Cloudera distribution is also available separately from BDA from Cloudera.
Can R code statistical programs be used with the Oracle/Hadoop system? Open source R is shipped with the Big Data Appliance
What are you using for Management/Monitoring Oracle Big Data Appliance includes the Cloudera Management Suite
is it liciensed product or it is like linux , you pay for support You can get pricing information at oracle.com/pricing
Are you going to present big data appliance to gold/platinum partners? is there any partnet specific information that will be shared? Yes. There are plans underway to for OPN training/materials on BDA.
Oracle Exadata and Oracle Exalytics
Do you need the Exadata machine to use Oracle Data Integrator? No.
Is this appliance different from Exadata Yes it is.
do you need exadata to use exalytics? They work well together and also can be used independently.
Hadoop
How this is different than MPP databases ? Hadoop is a form of MPP system, but the jobs run independently, so there is no locking, no contention and no hidden dependencies. It runs what is designed as a Directed Acyclic Graph. There is no need for state management.
Is Oracle using Hadoop? The Oracle Big Data Appliance runs Hadoop.
Will Hive and PIg include in Oracle BigData solution after Cloudera gets into the picture? if no, will Oracle develop its own tools with comparable functionalities? Oracle ships the Cloudera distribution including Apache Hadoop with the big data appliance. Hive and Pig are both part of that.
Organize phase talks about using Hadoop and HDFS as temporary database and then load it back to Oracle database for analysis. Is there a way that Oracle's BI tools (OBIEE) can connect directly to HDFS On a supported version of Linux, it should be possible to have read access to in-place HDFS data using Oracle Direct Connector for HDFS. The Direct Connector uses Oracle DB External Table mechanism so the HDFS data should appear like any other database content to the BI Server.
please correct me if i'm mistaken: so o analyse the data from hadoop from RDBMS i have 2 options - a. pull it into rdbms or b. store it in external tables. Correct? For clarity. You don't actually store data in external tables, you configure the External Table features and access data in HDFS through the External Table mechanism. External Tables contain metadata about the location of the data. That said, yes, these two options allow you to do analysis on data that is on HDFS along with data in RDBMS. There is also Open Source R in the Oracle Big Data Appliance and Oracle R Connector for Hadoop, part of Oracle Big Data Connectors.
pardon my ignorance , but should external tables reside on HDFS? and if the answer is "yes" does it mean that my sql to external table implies the map/reduce jobs running over the initial set of data? The External Table is a standard database feature. External Tables are defined on Oracle Database and we use an External Table mechanism to access the external data when the External table is queried. The SQL accesses the HDFS as external files; we do not use map/reduce. Map/reduce jobs on the Hadoop cluster or Oracle BDA can run against the HDFS data independently of the read-only database access.
Oracle NoSQL Database
What is NoSQL Database? Oracle NoSQL Database is a commercial grade database using a key/value paradigm
How do you query a NoSQL database? Through the API provided by product
Are there any sample programs using Oracle's NoSQL to process these variety of data like weblogs, network data, etc? The Oracle NoSQL Database download includes some sample applications. We'll be adding more application examples to OTN over time. If you have a sample application that you'd like to share, please share it and let us know.
Can we store RDF data in this NoSQL DB? NoSQL Database is a distributed key-value database. Yes, you can store RDF data as key-value pairs in the database. But query and lookup APIs/behavior will be different.
Do you have any customers using NOSQL in production? We have several customers, partners and internal Oracle groups that are actively building projects with NoSQL Database. Since the NoSQL Database was released only 4 months ago, we do not have any customers in production at this time.
How scalable is Oracle NoSQL? How many nodes in a cluster can it support? We've designed Oracle NoSQL Database to be highly scalable and highly available. We've tested in-house with simple clusters of 3 nodes to large clusters of 200 nodes. We have not encountered any issues that would indicate that we'd have any problems scaling to systems with many hundreds of nodes.
How are concepts like indexing and querries applied to NoSQL DBs? NoSQL databases in general vary widely in terms of APIs, Indexing and Query Languages support. The Oracle NoSQL Database has is a distributed key-value database, where the keys are stored in a distributed b-tree. The Oracle NoSQL Database has a Java API that supports simple get/put/delete operations. The get (search) operations support both specific key lookups as well as range queries.
Does NoSQL DB have Oracle Label Security? Not at this time. We have had some requests for label-based security. Please contact us so that we can better understand your requirements.
How can I use NoSQL DBs in the context of ERP systems if I can? ERP systems are typically based on RDBMS queries. NoSQL databases can't really support complex queries and joins. However, there are aspects of ERP that could be enhanced/extended to include NoSQL DBs as a mechanism for capturing and managing data that is not currently captured/managed in the ERP system.
Will there be a Groovy/Grails API for NoSQL? A Rails API? There are no plans to add a Groovy/Grails API for NoSQL at this time, but we'd love to understand your requirements better. There are already some community-contributed APIs for NoSQL DB. You can find a pointer to them on the the NoSQL Database OTN pages.
How does Oracle's NoSQL database compete or complement Oracle Coherence? They mostly complement each other. NoSQL DB is a distributed persisted key-value database. Coherence is a in-memory distributed key-value grid. You would typically use Coherence if you have a data set that fits in memory and you need in-memory performance. You would typically use NoSQL DB if you had a very large data set that will predominantly be stored on disk, with a cache for fast btree access.
will the existing applications using rdbms require code change before using NOSQL DB ? Yes, both the APIs and the queries that are supported are different.
Oracle Data Warehouse? Is this regular Oracle 11G or NOSQL Oracle Data Warehouse is Oracle 11G. The Oracle NoSQL Database is one of the storage options for capturing and managing your unstructured/semi-structured data.
If Oracle Big Data uses Hadoop, when does it use BerkelyDB Oracle Berkeley DB Java Edition is used for the data storage nodes within the Oracle NoSQL Database. As outlined in the presentation, if you have a batch-oriented, write once, read/process in bulk then the best storage option for that functionality would be HDFS. If you have a OLTP-oriented, high concurrency, record-oriented read/write workload then the best storage option for that functionality would be Oracle NoSQL Database.
What is the advantage of using NoSQL DB(Paid Option) instead of Berkely DB or MySQL (Free options) ? NoSQL Database is a distributed key-value database. BDB is a single-node key-value database. MySQL is a single/clustered SQL database. They are designed to address different needs.
Why we should bdb based nosql database when hbase is already available with the distribution We cover NoSQL Database in more detail in the 3rd session. But two general strengths are its general purpose nature (consistent performance over a wide range of workloads and database sizes) and simple programming model (developers don't have deal with eventual consistency, for example). HBase differs from Oracle NoSQL Database in several important areas. HBase is a column-oriented database, where as NoSQL Database is a key-value database. HBase uses HDFS for storage, where as NoSQL Database uses a log-based storage system (leveraging Berkeley DB Java Edition). In general, you can think of HBase as a way of turning SQL statements into MapReduce processes to extract the data that you need from HDFS, while NoSQL Database uses a distributed B+tree to store and manage the data. HBase (like other products based on HDFS) is going to do well with workloads where bulk reads/bulk writes are required. NoSQL Database is going to do well with workloads that require access to specific key-value pairs or sets of key-value pairs.
Last time i checked, the oracle nosql database wasn't available in a community version. is there an expected date for the community version (if it isn't already available)? The Oracle NoSQL Database Community Edition is available as part of the Oracle Big Data Appliance or as it's own software-only download from OTN. The Oracle NoSQL Database Enterprise Edition is available seperately, also on OTN. You can learn more at www.oracle.com/bigdata and www.oracle.com/technetwork/database/nosqldb/overview/index.html.
Does this mean everything can be or has to be defined as key/value - is it posssible with unstrucred data Oracle NoSQL Database is a key/value database. Keys are strings, comprised of a major and minor key component. The major and minor keys can be arbitrarily long, although it is recommended that they be kept small to ensure the best B-Tree performance. Keys as associated with a value which is a simple byte array that can either be a simple value or a complex serialized object with many elements. You would design your key-value schema in much the same way that you would for an RDBMS application -- basically structuring the Key(s) so that they support the most frequent application queries and structuring the Value to contain the unit of information that is relevant to the key that points to it and so that it contains the information that the application most commonly wants to be returned when querying for that key.
Will you cover topic like how NoSQL Database is better than something like Hadoop in this session? Not better, but different. We'll cover that in the third presentation
Is the integration work between Oracle NoSQL Database and (HDFS, mapreduce, pig and hive) already completed? There is an interface enabling Hadoop to pull data from NoSQL Database. More info at oracle.com/bigdata and click on NoSQL Database. Or search for NoSQL on OTN part of oracle.com
Is Oracle NoSQL an option of Oracle Enterprise Database or is it a separate database like mysql? How much does it cost? It's a separate product. Check oracle.com/bigdata and follow the link there. Much more detail on that product in the third session
Does NoSQL Database Enterprise Edition also run Hadoop ? Yes, you can use EE with Hadoop as well. It's the same API call in both Enterprise and Community Edition that enables a Hadoop process to access data stored in NoSQL.
is there any easy solution to convert a relational data base to no sql? No. That could be a complex, application specific task and a NoSQL database would not have many of the features (like complex joins) that are available in an RDBMS. NoSQL databases are quite different and how they fit into your overall application solution will vary significantly from application to application.
Can data in a NoSQL database be joined with a relational database? For example, store millions of rows of data from Smart Meters but be able to join it to customer data in a relational database. Yes. In fact, that is exactly what most Big Data applications do during the Analysis and Reporting phases of the application. There are typically two ways to accomplish this: A) move the relevant/interesting/most recent data from the NoSQL Database into the Data Warehouse/RBDMS system and perform the queries/joins there. This can be performed by using tools like the Oracle Data Integrator or the Oracle Loader for Hadoop. B) use the Extended Table feature of the Oracle Database/Data Warehouse to dynamically access data in the NoSQL Database as part of a SQL query (performed via a set of MapReduce jobs).
Can it store java objects such as customer data (name, phone etc.)? Yes. The value field in the NoSQL Database is a byte array, which can contain simple values or complex serialized objects. The Java application that interacts with the NoSQL Database provides the serialization/deserialization methods for writing/reading the data structures to/from the database.
Please, clarify why we cannot utilize existing Oracle Database or any other relations RDBMS in place of NoSQL. After all Oracle can work with file system as well. External table or BFILE for example. You can. NoSQL is an option that works well for certain customers and use cases. Pick the correct database according to your use case.
Will be it possible migrate exist database to a new NoSQL model. Depends on the database, schemas and use case.
What is the difference between NoSQL and a database like cassandra or Big Table We cover NoSQL Database in more detail in the 3rd session. But two general strengths are its general purpose nature (consistent performance over a wide range of workloads and database sizes) and simple programming model (developers don't have deal with eventual consistency, for example). Both Cassandra and Big Table are columnar databases (data is stored in columns and column families), use HDFS for storage and have an eventual transaction consistency model. Oracle NoSQL Database is a distributed key-value database (data is stored in key-value pair records), uses the log based storage system in Berkeley DB Java Edition and supports configurable ACID transactions. How those differences in implementation will impact your application largely depends on the specific characteristics and requirements of the application.
Do you see legitimate use cases for NoSQL databases used by more traditional applications. Or is NoSQL only good for instaking Big Data streams? We will cover NoSQL Database use cases in more detail in the third session. In general, when most people talk about "traditional applications" they mean applications that use SQL to access an RDBMS. The bottom line is that it depends on the application requirements. If the application requires SQL, complex joins, stored procedures, etc. that are features of an RDBMS, but are not features in a NoSQL database then that application is not a good use case for a NoSQL database. If however, the application does not need RDBMS-specific features, but specifically requires the features that are in the NoSQL Database, then it would be a good NoSQL use case. Please review the presentations about NoSQL and the use cases around it, and consider your question from the perspective of the application requirements and the type of use case that is best suited to a non-relational data storage methodology.
How do you move data from nosql to oracle database? Export it to Hadoop and use MapReduce and Big Data Connectors as needed.
Will "hints" work in No Sql databases No. Most NoSQL databases do not support a query language and much less "hints" which are a specific feature of SQL/relational databases.
What are oracle plans for integrating and making nosql work easier with oracle db? There are already methods that allow you to exchange data between the NoSQL Database and the Oracle Database, which are part of the Big Data Connectors. If future releases we will be enhancing and improving those interfaces. We will also be providing additional features and functionality based on feedback from our customers.
General Questions
How do you sell this offering? Does one have to license each of the 5 components? All these products are separately licensed.
For the shopping cart example: Wouldn't the shopping cart already be in SQL as part of an application? Are you saying that we move the data from SQL (in the cart app) to NoSQL and back to SQL (in the warehouse)? That is one option. In fact it is what most of the large eRetailers actually do. The shopping cart is basically a web session object and remains in the NoSQL environment (like other session-specific objects) until it is finalized. Once completed, the contents of the shopping cart are stored into the RDBMS for fufillment, accounting, billing, marketing analysis and reporting.
Does Big data support all databases or only Oracle database You can use a wide range of different databases in big data solutions.
How do you see BigData and Analytics on mobile devices come together? We do see this coming. For example, we have added support for mobile devices with our latest BI software. Does that answer your question?
Hi. Looking to get some sort of general overview regarding Oracle vs. MySQL capacities/relevance for BigData apps. Any white papers or graphics known to be available? ~Rob Check oracle.com/bigdata for the different products. There are several white papers and more info available.
Hoe does Big Data helpful in transportation network and logistics? That would be a long answer! But, yes, we do see big data in that space. Tracking the location of packages and analyzing shipping trends to give one (simple) example.
Can we migrate data from one database to other database using Big Data apart from unstructured data? Oracle Golden Gate can help with that, independent of big data.
If Oracle using R to do the data mining, why still use Oracle Data Mining tool? For data mining, customers may choose to work in R or SQL with the Oracle Advanced Analytics option.

Oracle R Enterprise provides access to R's rich set of data mining functions and open source packages. User R scripts can be invoked from the user's local R environment or, using embedded R execution, can be executed at the database server, either using an R or SQL interface. Using the SQL interface, R scripts can be operationalized in database workflows. In addition to R's data mining capabilities, Oracle R Enterprise provides access to in-database data mining algorithms.

Oracle Data Mining provides algorithms that are accessible through a SQL API and a graphical user interface, Oracle Data Miner. Users who need an exclusive SQL interface or the Oracle Data Miner for in-database mining would use ODM directly.
Clarifying ... it seems all these Oracle solutions are only one size - Super Big ... there's enough functionality to conquer anybody's problem ... but ... I don;t see any of your sales pitches scaled to actual corporate demand management deliverables ! What size do you need? We hear that the typical Hadoop cluster in the enterprise is 100 nodes which makes the Big Data Appliance a good size.
One of the big challenge with R is that it can not handle large data sets, right? Are we doing it differently? Yes, Oracle R Enterprise runs in the database.
What other database are supported for Oracle Data integrator? There's a full list on the oracle.com site
Is there a newer version of Oracle that will include all phases acquire, organize, analyze and decide in one single database for big data By Oracle do you mean Oracle Database? For big data people tend to use multiple products for these phases. One big database is probably not the way to go, given the diversity of the different kinds of data.
Does the R Statistical Programming Language something like SAS? Similar technical capabilities
Are these clusters based on a shared-everything or a shared-nothing architecture ? Hadoop clusters essentially follow principles of shared-nothing architectures.
Can I use BigData as the data store for Oracle Coherence or in other words front-end BigData with Coherence? Oracle Coherence is a in-memory key-value grid cache. You can use Oracle Coherence as your distributed in-memory cache for your application. If your application reads data that is stored in the NoSQL Database or HDFS, it can be cached in Coherence if that is the requirement and the data set fits into the available memory. At this time there are no automated tools for loading NoSQL or HDFS data into Coherence -- which wouldn't really make sense, since you really only want the working data set to be managed by Coherence, not all of the data stored in NoSQL or HDFS.
What is typical cost for a basic installation? Check oracle.com/pricing
Will this impact of future of relational DB for large DB specially current dataware house practices based on RDBMS This will extend the capabilities of the Data Warehouse, both in terms of data capture (scaling out large data sets on commodity hardware) and in terms of processing the newly acquired data and joining it with the existing functionality in the Data Warehouse. The Data Warehouse and the feature-rich environment of the RDBMS with SQL will continue to be a key component within BI and Data Analysis functionality.
Is Oracle recommending a 4-tier architecture for Big Data? It depends on your application and your operational requirements. Oracle provides the technology and flexibility to design and implement Big Data solutions focusing on the four phases of Big Data -- Acquisition, Organization, Analysis and Decision. A-O-A-D is not really a 4-tier architecture; it's a framework for how to think about the various technical requirements that surface when considering Big Data projects. Which specific technology you choose, how many tiers you implement and what your Big Data solution architecture looks like will depend primarily on the data processing and operational requirements of your application.
We have OLTP system capturing around 50+ Gb of data/day and the DB has 10+ TB of data on it and anything older than 30 days can be considered as historical. Can big data help in this situation? You seem to have both the requirements/characteristics of high volume and high velocity. Big Data can certainly help to address those needs. How those requirements fit in with the rest of your OLTP application requirements will determine wether Big Data technologies (like NoSQL Database and HDFS) are appropriate for your data management needs. For example, if your OLTP application requires SQL, complex joins and real-time data access, then an RDBMS is a better fit for your requirements than NoSQL Database or HDFS. However, if your OLTP application requires high performance, high throughput very simple key-value record-based read and write, then the NoSQL Database would be a good storage technology option.
Is there a comparision chart of what DB to use in what type of application. (BDB, NoSQL, MYSQL Cluster, Cache Coherence, TimesTen IMDB Cache ?) Each one of these products has documentation that includes discussion of some of their primary use cases. It is difficult to say that a given application type requires a particular data storage product. It is not so much about the application type as the application requirements, and how those requirements map to the most common use cases associated with a given product. For most application types, there may be two or three different storage technologies that would be viable options. Your choice would be driven more by the specific application requirements.
When will Oracle Education be having classes relate to "Big Data" . . . Hadoop, NoSQL, Oracle Enterprise R ? We have the following classes: (1) Introduction to Big Data -- In this seminar, students will be introduced to Oracle Big Data Appliance and learn technologies for all the phases: acquiring, organizing, analyzing and conquering Big Data. We also have (2) Introduction to Oracle NoSQL-- In this seminar, students will learn about Big Data and NoSQL concepts, identify and use Java APIs to access the Oracle NoSQL Database, and use the Administration console. You can also view learning paths for Oracle Big Data Appliance Training here.