Oracle Big Data Essentials Online Forum - Session Q&A Transcript (Raw) from Feb. 16, 2012



Panel Discussion

Question Response
Is no SQL really only geared toward data ingestion? It would seem like retrieval would be very expensive. Retrieval based on a known index is fast. Queries across indexes are very applicable to Hadoop type operations
If we are running DW in Oracle. Can we migrate to BigData/Hadoop OR we still need to keep existing 11g DW for DECIDE phase ? Typically we would recommend a more traditional DW platform such as Oracle for the decide phase.
Cloudera Yes, the Oracle BDA bundles Cloudera Enterprise, including the open source CDH platform and Cloudera Manager 3.7.
We have a cusiomet who wants to track his vehicle locations via GPS tracking devices. with estimated generated data of 1-3 TB yearly (there's no need to store older data), is Oracle SE enough to handle that, is there any "solution strategy"? SE can handle this amount of data. However SE has no parallel capabilities, so data load and queries of large amounts of data could be slow
i am asking is oracle big data appliance using cloudera distribution of hadoop Yes
Do NoSQL database rely on the application to enforce data integrity rules? Yes
how will oracle (and cloudera) keep true to opensource if a lot of solutions will be included in "closed' Oracle appliances. Cloudera invests substantial engineering spend in the open source platform. The BDA bundles CDH, which is and will remain a 100% Apache-based platform under the Apache license. There are additional pieces of software in the bundle, but the core Hadoop HDFS storage and MapReduce compute infrastructure is and will remain 100% open source.
how do you create DR solution for those kind of technolog? Typically the technology builds in some level of data redundancy. For example, HDFS has a mirroring capability. For very large environments, data mya be loaded into two different environments at the same time
big data and exadata appliance difference? Yes - one runs Big Data software such as Cloudera, Oracle NoSQL etc, the other runs the Oracle Database. They are complementary solutions
Is velocity the important charateristic of Bigdata or Volume? Yes, and variety. They all are design considerations for how you approach and select a solution for a problem with challenges on one of those dimensions.
my feed is freezing a lot so I apologize if you answered this already, but how does cloudera compare to other big data products/solutions? Cloudera was the first vendor in the Hadoop space. The guy who created Hadoop, Doug Cutting, works here. We contribute substantially to the Hadoop project and related open source efforts. We focus on easing enterprise adoption of Hadoop; we're the number one vendor in the market doing that, and it's our long-term strategic focus, so it's where we'll continue to innovate.
Is there a yard stick for defining BIG Data? There are the four parameters that are used to define Big Data: High Volume, Variety of sources, High Velocity and Low Density Value (lots of noise)
Please, clarify why we cannot utilize existing Oracle Database or any other relations RDBMS in place of NoSQL. After all Oracle can work with file system as well. External table or BFILE for example. You can of course use Oracle RDBMS to acquire data as well. However Oracle is a reasonably expensive solution, so it is typically used for transactional data. NoSQL and other Big Data solutions are typically used when there is lots of low level data that may not have yet be processed into transaction data
how is Hadoop different from Netezza or GreenPlum Several dimensions of differentiation - hadoop is software where Netezza is an appliance. Both Netezza and Greenplum handle relational data in an MPP shared nothing architecture, but require an apriori data model.
cloudera has big data appliances and oracle has announced oracle big data appliance, customers would like to know the architectural differences between the two Cloudera's a software company; we don't sell appliances. We're working with Oracle, who do develop an appliance, to deliver big data services generally, including Cloudera Enterprise. Our software bundled on Oracle's hardware is the appliance offering that is BDA.
What kind of application/industry will mostly benefit by this? Generally industries that generate a lot of instrumentation data - web based industries (web logs), health (medical data), Utilities (network data), etc etc
Do you see organizations doing BI with both structured(RDBMS) and unstructured togather or is the nature of the data tends to keep them seperate? Absolutely, by combining data from different sources, and data of different types, you can derive a lot more value out of analyzing it.
What is Oracle's relationship with Cloudera ? Oracle resells Cloudera Enterprise, including our open source CDH platform built on Apache Hadoop, and our Cloudera Manager software, currently version 3.7. We're independent, but working closely on this partnership to ease adoption of Hadoop by enterprise buyers and by existing Oracle customers.
Relational Database is complement to HDFS Yes - and compliment to other types of non-relational data storage. The data management challenge is how to manage the hetergenous data systems across all data types - with RDBMS, non-relational, distributed, content, ect.
Why would someone prefer Bigdata over Hadoop ? Big Data is the phenomena. Hadoop is one way of processing Big Data. They are complementary.
How can you secure a hadoop infrastructure? Kerberos is the core of the Hadoop authentication and privilege management system. In addition, customers use strong encryption to ensure security of data at rest. Hadoop's made enormous strides in the last few years in providing enterprise-grade security, and you'll see that continue.
Many enterprise may not have the volumes justifying Big data but still have data typical to NoSql databases like GPS data. My question is : Don't you think that the only way see Big Data used widely is via Cloud service and public data (Gov stats for ex)? Most customer I talk to are not yet happy putting their data into a public cloud service. Instead they build out the database cloud internally.
what are some of the Big Data use cases you see in financial services / banking space? Apart from the general use cases (individual marketing, log management, etc.), we're seeing interest for risk calculation, fraud and some specific financial security use cases like money laundering.
Why would someone prefer BerkeleyDB over Hadoop? BDB is a single node solution. We will cover NoSQL Database and Hadoop in more detail in the next session.
ok thanks, so oracle big data appliances comes bundled with cloudera software, so customers can pick and choose on the softwrae components ? Customers can add additional software components as they like.
How does Big data manage and address the garbage in garbage out problem to ensure only good and relevant data are being capture or transformed ? Data Quality is still an important issue in Big Data, and while there aren't many out-of-the-box solutions yet, we will see more of those come out soon. Data Quality for Big Data is one of the items on Accenture's R&D agenda.
what is the criteria to select between Hadoop vs. NoSQL? The differences and use cases for Hadoop vs. NoSQL will be covered in the next session, “Acquiring Big Data” at 11:00 a.m. PT / 2:00 p.m. ET. Please return to the main page and click on the link for the next session.
How about financial industries like banks ? The financial services industry presents many opportunities for Big Data solutions - many based around large scale data processing problems, such as fraud. While financial transaction will continue to be trasactional, the question is how to store data efficiently and process the extremely large quantities.
please redefine HADOOP. Apache Hadoop is a open source framework that allows you process data in parallel fashion through programs written using the MapReduce paradigm.
What would be considered large volumes of data? We have customers using Hadoop productively on modest data volumes -- as small as a few terabytes. In that case, the computational workload -- the analysis -- is pretty sophisticated. Hadoop scales extremely well, of course. We have multiple customers with more than 10 petabytes on spinning disk managed by Hadoop. Good news is you can get that big, but of course you needn't. More than a quarter of all Hadoop clusters have 100 TB or fewer.
Isn't Google already doing the same thing, doing massive data mining to figure out user preferences, how is their approach different than that of Hadoop/Oracle solutions? They use these techniques. They published a key paper on MapReduce which helped to start Hadoop. Yahoo are big Hadoop users. Big difference between them and more mainstream use is the sheer size of what Google does
how does cloudera compare to other big data products/solutions? We were the first vendor to concentrate on enterprise adoption of Apache Hadoop. We've been in business for nearly four years, and have extensive experience with many customers across vertical markets. We've got the industry's only management console for operating Hadoop in production. We're all in on Hadoop; that's the platform we provide, and the one on which we plan to concentrate long term.
Have you seen any BigData implementations in the Healthcare Insurance space? Yes - the complexities and volume of data in Healthcare will require Big Data solutions. For insurance, think fraud - analyzing large volumes of disparate data sets for unique patterns.
Given Oracles vertical market solution plays, will there be a set of 'Big Data' appliance industry solutions? Such as we see with SAP HANA solutions e.g. Smart metering. Yes we are working on a number of Industry lead iniatives
Don't you think velocity can be handle by the Traditional Database system depending on the design, however volume and varitey may be an issue to the traditional database system in respect to TCO Yes. The real difference is value. Big Data typically is low informational value, so needs a lot of post-processing to turn it into something of significance. Big Data provides a low cost platform to do this sort of processing on, prior to being loaded into an Oracle Database
How do you compare Oracle's big data technology to that of Microsoft's big data technology? I don't know enough about Microsoft's Big Data technology to comment. That in itself is very telling
How well does Hadoop performance compare to other Big Data technologies? Are there any documented benchmarks? The Big Data technologies perform very differently on different access patterns and analytical use cases. It's difficult to answer in the abstract.
Does Oracle Big Data offer any SQL syntax way to query (map/reduce) directly to your Hadoop Data Store? There are ways to use SQL to access or query data in HDFS. Check out the Big Data Connectors product for a starting point.
Is MapReduce a programming languaje? A batch processing paradigm, often rendered in a language such as Java
Big Data seems to be being used synonymously with "unstructured" data. Aren't these techniques and technologies still interesting with moderate sized unstructured data sets? Definitely. The techniques have broad applicability, and in the near future we'll see more and more use cases where Hadoop and related technologies will be used. There is not strict definition of "how big is big" - it depends on your data and what you do with it; "Big Data" technologies may be applicable for data sets from gigabytes through petabytes.
What is achievable by Big data but not by DSS running on Exadata? Difference is in price point metrics. Big data is low value. Process it into high value data on Big Data, move results into traditional database for easy access via the rest of the org.
what is the single point of failure issue with hadoop The name node.
How much java programming is required for the hadoop vs the Big Data Solution? People typically, though not always, use JAva for MapReduce programming.
Do you have a whitepaper which details the role of Big data platforms along with RDBMS? Yes - many - check them out here - <a href='' target='_blank'></a>
Can big data play an important role in the Content area ad Oracle related products like UCM, SecureFiles etc... or not at all ? Content managed in these system may be the source of data for the Big Data operations.
How does the Oracle solution, at least the Acquire part of it, compare with the Amazon DynamoDB solution? Dynamo is a NoSQL database. We'll cover that area and Oracle's own NoSQL Databse in the next session
Could the big data simply be loaded into a key-value relational database table on Exadata class database machine instead of need Hadoop? It all depends on the problem, complexity of data set, and data size. For simple, one dimensional data sets a columnar or key-value store such as Cassandra could suffice. A relational DB may break down when volumes become sufficiently high - hence the need for a distributed system.
And redefine NOSQL please Not only SQL - a term to describe the multitude of data storage and processing solutions beyond traditional RDBMS.
In the public market there is a lot of talk about opendata. Al that data is for a analyzing point of view unstructured. Can I put the Bigdata solution on top of it so I don’t have to worry about the relational structure of the original data Yes, especially for exploratory analysis, treating data as unstructured may speed up the process significantly. If there is inherent structure within the data, then it makes sense to use it to make the analysis easier.
What are difference or similarity between Big Data & Netezza ? Big Data is typically low information density. Netezza is more of a traditional database (like the Oracle Database). So one is a pre-cursor stage for the other.
Can someone in your panel speak abot secuirty of HDFS - it is very primitive at the current state We've done a lot of work to integrate authentication and ACLs into Cloudera Enterprise. This is a big piece of the value of our product, frankly; it's an area where the community did some early work but didn't completely integrate security across all the components. In addition, customers very commonly use strong block-based encryption to guarantee security of data at rest. You'll see Hadoop in particular and big data in general improve in that area.
what is the genral duration of a hadoop implementation project in a company? Answer varies by customer and use case, of course, but it's common to see an exploratory or proof-of-concept phase of a couple of months, followed by limited production deployment on just a few use cases of another six or so months, followed by general production deployment.
How can small to medium size business tap into big data? One way to minimize upfront costs and try out these technologies quickly is by using public cloud infrastructure. Most of the software is open source, so you can try it for free, and a number of vendors provide packaged images (e.g. AMI for Amazon's cloud), so you can try it very easily and at very low cost, before fully diving into it.
Does Oracle Data Integrator bring Big Data/Hadoop into RDBMS format so that it can be used with Datawarehouses ? Oracle Loader for Hadoop does that - it takes data out of hadoop processes and streams it into the Oracle Database. ODI is used to generate the workflows that does this
Going to big data means reduced window of opportunity to check the quality of data, is that a problem when this data is used to make business decision? Potentially! But the data also opens up new opportunities by giving you information about things your conventional data doesn't
do we need to change how we arhitect databases such as what is required as for columnar databases for BIG data Column based storage is typically an attribute of the database, and as such doesn't often need to figure into a logical design. However some queries are better suited to column based than others
What would there be a risk on standarized on one ETL tool to process the data? No risk in particular. We've been working hard to make sure that Hadoop integrates well with the infrastructure already in use by customers.
Would there be any benefit to ingest structure data into bigdata? Yes - depending on the size and complexity of data processing. If the data set is sufficiently large it needs to be distributed across multiple systems; if the processing requires large batch processess across distributed data sets
what about the technical differences that make cloudera a better choice over the other out there? Cloudera has good support and market presence
is Big Data the same as Business Intelligence, difference is processing is on a cloud vs. ETL? Not quite. Big Data really means you are going to use a very wide and diverse pool of data to start with. Then new techniques and tools to manipulate it all. But you still use BI tools to look at the results of the analysis and make decisions
I've heard that Endeca is a great way of mining through small-mid size data sets without using Hadoop is that right? Yes
Is Big Data more about distributed processing of large anount of discrete data and then align, converge or combine with traditional "relational" data? Big Data technologies enable large scale distributed data storage and processing; which then requires additional effort to align, converge, and integrate with traditional RDBMS. The data management challenge becomes more complex with heterogenous data sets.
What is the difference between "structured","semi-structured","unstructured"? How do we define "structureness" of data? I like to joke that all data has SOME structure. Strucuted data typically has a recognizable organization - a schema if you will. Less structured data typically is things like comments from a customer service rep system, or survey responses. Semi-structued falls inbetween, like a set of key value pairs where the values are images.
Do I pay extra for Cloudera on the Big Data Appliance or is it included? Included as part of the Big Data Appliance software
How many servers are used in an average big data solution? Typical Hadoop clusters are in the range of 20 nodes (pilot projects) to 100 nodes (for initial deployments). But this varies widely according to needs and expertise
Microsoft and IBM also adopted Hadoop and HDFS, is there connection possibility between their solutions and Oracles solution for integrated Analysis and Dashboards? Yes
What is the role of cloudera in this technology? Cloudera provides the Hadoop implementation
How long are companies typically keeping the data collected for BigData analysis? It all depends on how useful the data is. A key benefit of Hadoop is that it allows you to take advantage of cheap disks to store huge volumes of data at low cost (volumes that previously would be thrown away or archived to tape). But data like raw logs may not have much value beyond days or months after it has been collected.
How are we going to address the bandwith for backing up and restoring big data from the public cloud. You will need to talk to your public cloud provider for that one
Is there modeling methodolgies for Big Data like in Relational models or is it just an anarchy of pairs ..? Generally is is the "freedom" of named value pairs. Of course if you want a relational model you can also use a traditional database as well, they aren't mutually exclusive
Does OracleNoSQL DB use Hadoop? No - NoSQL is a named value pair database, that can be used to store the data that may then be feed into a Hadoop process
Given the coding requirements that Hadoop has to query things out from it - would you say the DBA's need to be programmers as well ? Probably not - typically programmers do the Hadoop programming
How does Hadoop help move all data (web blogs, emails, customer audio intergactions etc...) into the nodes? Think of Hadoop as a big sorting operation. One node looks at the data it has, and then sends it to another node for processing. That node then picks it up, looks at it a different way, and sends it to another node. Enough nodes and you can sort and aggregate data really, really quickly
Does Oracle offer a Cloud solution for Big Data? As in a public cloud ? Not at this time. Most customers are deploying private clouds, the Big Data Appliance is an example of a pre-built privat cloud for Big Data
What questions need to be asked to a customer that is looking to consider Oracle Big data What is their Big Data requirement ? What business problem are they trying to solve ?
does hdfs allow to partition data eg I use it to support a saas solution that partitions data between tenants You would need to build that as part of the deployed architecture

Acquiring Big Data

Question Response
50,000 users using my company's website generate 13 million JSON events data daily - would this be considered as Big Data? Yes. Big Data means different things to different people. If you think about Big Data as being Volume-Velocity-Variety-Value, your application shows a great combination of velocity and variety. There are probably elements of volume as well.
ACID? Atomic, consistent, isolated, durable.
are there any guidelines as to what technology should i be using given a use case- relational, columnar db, hadoop/hdfs or nosql? It's a complex question! This session covered NoSQL vs HDFS. We have some white papers on that might help
Are there any limits for real time data transmission? As with most large, distributed systems, the common bottlenecks are going to be network bandwidth and I/O subsystem. You definitely want to plan for large capacity networks and multi-spindle high performance disk subsystems.
Are there any plans to integrate Oracle NoSQL with commercial-grade Hadoop solutions like MapR? Oracle NoSQL Database has a Hadoop export class. It was designed to be able to be easily integrated with any of the Hadoop-based tools.
can 2 different key/value pairs be related, as a foreign key relationship? or such relationship is not needed with big data. Yes, they can be related. In the current release that relationship (and nested lookup, if needed) must be enforced/implemented by the application.
Can Hadoop handle integrating changed data with previously existing data with different options (say overwrite or keeping multiple versions of the data,etc)? Hadoop provides you the framework to process large amount of data in a massively parallel way. It is up to the program written, using the MapReduce paradigm to handle such requirement.
Can I obtain Oracle use cases on Big Data ? if so where can I download these ? Start by looking at and the individual products there
Can the presenter cover durability and CAPS comparisons That level of detailed technology discussion is outside of the scope of this talk. However, there are several articles available on the web, including Wikipedia, that explore the CAP theorum, ACID and BASE transaction semantics. That would be the best place to start.
Can we have multiple/Different Major keys in NoSQL database like Userid / ProfileId / Login? Yes, you can have multiple records that share the same major key, but have different minor keys. There is a good discussion of how to define your Major and Minor keys in the NoSQL Documentation, which you can find here.
Could key/value pairs be thought of as columnar storage of tables? Key-value pairs databases are not columnar data stores. They have a simpler/more flexible data model. They are more like key-value records where the value can be a simple value or a complex structure. Think of them more like de-normalized tables.
Could you please provide url links where we can find relevant training on Oracle Big Data products relevant to the role one plays in IT i.e. developer, manager - etc We have the following classes: (1) Introduction to Big Data-- In this seminar, students will be introduced to Oracle Big Data Appliance and learn technologies for all the phases: acquiring, organizing, analyzing and conquering Big Data. We also have (2) Introduction to Oracle NoSQL-- In this seminar, students will learn about Big Data and NoSQL concepts, identify and use Java APIs to access the Oracle NoSQL Database, and use the Administration console. You can also view learning paths for Oracle Big Data Appliance Training here.
Do you have Sample Application using Big Data which can help understand more tech aspects of this ?  
Does new data sources include from government? Certainly.
Does OEM manage O-NoSQL DB? Not yet. Work in progress. Stay tuned for future releases.
hdfs refer to ? Hadoop Distributed File System
How do we acquire data from various social media and sources? Does it use some web crawling method? You can gather/capture/crawl your own data or use any one of a number of services that are now available over the web to provide access to social media data.
how do you access data in noSQL? There is a Java API. You link a database driver into your application and the driver knows how to reach the data and which node it is stored on.
How do you protect namenode You create a second namenode to be used for failover
How does Haddop move unstructured data to diff nodes in the HDFS? That is taken care automatically by Hadoop
How does HDSF performance in aggregating large amount of data to get a total  
How does it distribute equal amounts of data to each node in the cluster ? Not sure if the question is about HDFS or NoSQL. NoSQL database uses a hash algorithm on the Major component of the key. In other words, all of the records with the same Major key will hash to/be store in the same storage node. Given a random distribution of records across the key-space, this will tend towards an even distribution of records across the various storage nodes.
how does you know the weblogs are from a particular user if the user doesnot require to login on the website? You may not. But you probably know all the things that anonymous user did, which is still of use to look for trends and patterns.
How is Hadoop different from Oracle Coherence or TIBCO ActiveSapces? Coherence is designed for very low latency when all your data fits in RAM on the cluster/Grid. Hadoop is designed for bulk storage of data on disk and does not do low latency like Coherence.
How is Security handled in Oracle NoSQL Database . . . Is there Encryption? How is data at rest handled? At this time, NoSQL Database does not have Security/Encryption support. At this time it is assumed that the NoSQL database is inside of the firewall/Data Center and being accessed by an authenticated/authorized application. We would be interested in learning more about your requirements.
How is the value stored in NoSQL? Can I store a JSON Object or XML data? Can we index some portion of the value? The value is a opaque byte array. It can contain any type of data/data structure that is needed. Yes, you can store JSON or XML strings. NoSQL Database only supports primary key indexing at this time. You're application could certainly implement key-extraction, insertion into the key-value database and lookup. So, even in the current release of NoSQL Database you could implement indexing within the application.
Howshould an administrator in Healthcare Industry best trained to capture the patient data to prevent GARBAGE IN GARBAGE OUT scenarios ? It's hard to provide a specific answer without more details about the problem that you're trying to solve. Generally speaking, I would apply a periodic Hadoop/MR job to the incoming patient data to identify "GARBAGE" and either delete or mark for follow up. Basically combine the large amounts of data that need to be filtered, by relatively simple filters, with the parallelism and scalability of Hadoop/MR.
I thought, you can have one namenode per hadoop clusters You can only have one namenode active, but you can have a second one as passive
In Hadoop, is it possible to take advantage only of the MapReduce phase alone? For example if I don’t need to store any information, but instead I just want to stream it as it comes in and produce analytic output on the fly? You can stream data through Hadoop and load it somewhere, say in Oracle Database. If you use MapReduce you would be at least temporarily putting the data on HDFS.
Is it like a hierarchical model? Key-value pairs can be used to represent a hierarchy. It all depends on how you structure your key space.
Is monitoring Oracle DB (AWR) a good candidate for Big Data?  
Is NoSQL an open source database? There is an open source community edition and also an enterprise edition
is Oracle NoSQL Database included in the normal RDBMS software bundle? The Oracle NoSQL Database Community Edition is included with the Oracle Big Data Appliance. The Oracle NoSQL Database Enterprise Edition is available separately. You can learn more at
Is Oracle NoSQL more close to Hbase, instead of Hadoop? To Hbase. Hbase is also a NoSQL database, though with different characteristics.
Is Oracle positioning NoSQL as an alternative to Hadoop for big data? It's an option for certain use cases, but not a replacement for HDFS and totally complementary to MapReduce
Is the only legitimate use case for NoSQL for applications in the Big Data space or does would more traditional applications have value converting their databsae to NoSQL? Generally "traditional" applications use an RDBMS and take advantage of many things it can do that NoSQL can't. Remember you can't do complex queries with a NoSQL DAtabase. But there are always exceptions to that general rule.
Is there any solutions provided by Oracle to adress single point of failure - Name node? We work with partners like Cloudera and there is work in the community to start to address this.
So NOSQL absorbs data on multiple nodes with replication, but then you have to load it all up into HDFS some other place OR can you Hadoop right into Berkeley NOSQL (bypass HDFS)??? Oracle NoSQL Database does not store it's data in HDFS. It creates and manages storage in the local file system (file system that is local to the storage node).
what about aquiring GPS tracking data for vehicles, what's the best solution (big data quantity + complex routing reports=? Generally speaking, if the GPS data that you're capturing is going to be used for analysis later (batch), then HDFS is probably a good solution for storing/managing that data. However, if access to individual records, in real time or near real-time, is critical to your application, NoSQL Database would be a better option.
What about use models that share both the "real-time services" characteristic and the "batch read" characteristic; i.e., a web service that allows users to interact with their own data in real-time, but allows managers to analyze across 100s of users? You could use a mixture. Store the real time data in a database - you need that for latency and easy of simple queries. But extract all that data and process using Hadoop for the in-depth batch analysis
What are the hardware requirements for HDFS? Hadoop can run on most commodity hardware. No major constraints. Of course, you also need to take into account your own performance and reliablity constraints.
What data schema is widely used for Big Data? No standard. It depends on the data source and the intended use.
what if I want to do both types of acquisition and analysis (real-time and bulk), and I need to do it in an appliance (much cheaper than normal tools cost)...what options do I have? Take a look at Oracle Big Data Appliance. and click on the BDA link
What is Hadoop? A product, methodology, paradigm? An open source project, that some companies do commercial distributions of. We'll cover it in more detail in this session and the next.
what is the hashtag on twitter for this session? The hashtag for Twitter is #oraclebigdata
what Oracle aquired product did the noSQL come from? It was developed internally at Oracle. Berkeley DB which was acquired is a component, but this is a new product, not a BDB extension.
what should be the format of the data coming into HDFS or Nosql? Both HDFS and NoSQL are schema-less, so you can use the format that best fits your application. Thanks for attending the Acquire session, please see the Organizing Big Data session.
What's the largest size of data you can store for a key in Oracle NoSQL? There is no implemented limitation. We've seen keys anywhere from a few 10's of bytes to 1000s of bytes. Generally, we recommend that the key stay as small as possible to ensure the best performance and to reduce the memory requirements for the btree.
When do we recommend using the NoSQL db as opposed to using a KeyValue pair table in the Oracle relational database? If you need key-value pair tables, joins, SQL, etc. you definitely need an RDBMS. If you only need key-value storage, with simple get/put/delete operations over specific keys or key ranges, then NoSQL Database is probably the better fit.
Where is the name "Hadoop" derrived from? The toy elephant belonging to the son of the person who started the project.
Which (other than acquiring big data) use cases cover Oracle NoSQL as stand-alone product? We will cover this in a little more detail in the session. Think about things like online profiles where you store customer profiles and modify them in real time to change the experience they see. Think of the customized pages on a site like Amazon.
Why would you use Oracle NoSQL vs HBase/Hive? We will cover this later in the slides. Basically, Oracle NoSQL has some advantages in terms of ease of programming, support, manageability.
With Twitter data, do companies store the contents of the links that are included in the tweets? or is it algorithmically determined? That depends on what you want to use the tweets for. You can certainly store those links if you want, or purchase that info from companies who provide digests of twitter info

Organizing Big Data

Question Response
What about compatability with EBS/Peoples soft/siebel Enterprise apps? Big Data (unstructured data) can be processed using Hadoop or the Big Data Appliance and then use the Oracle Loader for Hadoop to load properly formatted results into an application database table -- this requires, however taht the apps are running on an 11.2.2 or above instance.
Will Oracle Express edition support the Oracle Direct Connector for HDFS? This is not in the current plans.
Is it possible to modify data on HDFS using Direct Connector? (INSERT/UPDATE/DELETE) This is not currently supported.
How is MapReduce different from PIG, HIVE PIG is a scripting language that produces MapReduce; HIVE is a SQL interface to HDFS files.
Does Oracle have any Products to load data onto HDFS? The results of MapReduce jobs can be loaded into HDFS. You can also use Oracle Data Integrator to load external file data into HDFS. If the question is can you load data from Oracle tables into HDFS, today this can be done with the open source SCOOP product that is part of the CDH distribution on the Big Data Appliance.
How much does the Appliance cost? The list price for Big Data Appliance can be foundhere.
Will you support CDH4 when it releases Future versions of Big Data Appliance will include new releases of CDH and Cloudera Manager, as well as any updates and patches
Can I load data onto HDFS? Yes. MapReduce jobs can certainly load HDFS.
Can the Big Data Appliance support both Hadoop/HDFS and your NoSQL environments? Yes. Big Data Appliance includes a full Cloudera distribution (including HDFS, HBASE, etc.) as well as support for Oracle NoSQL Database.
What is the URL for the Direct Connector pricing? Information on pricing is available on the US Oracle Technology Commercial Price List.
Can the Oracle Big Data connectors be acquired independently of the Oracle Big Data Appliance? Yes, the Big Data Connectors are sold separately and you can license Big Data Connectors on non-BDA systems. Oracle licenses and supports Big Data Connectors on both BDA and on customer Hadoop system as long as the Hadoop and database versions are certified with Big Data Connectors.
In Hadoop, is it possible to take advantage only of the MapReduce phase alone? For example if I don’t need to store any information, but instead I just want to stream it as it comes in and produce analytic output on the fly? You can stream data through Hadoop and load it somewhere, say in Oracle Database. If you use MapReduce you would be at least temporarily putting the data on HDFS.
Can you virtualize the Hadoop nodes? If your question is can a Hadoop node be on a virtual machine, the answer is yes. For example you can have Hadoop running on 6 VMs to have a 6-node Hadoop cluster. However, in general, Hadoop is not used in this environment for production deployments because performance (especially to virtualized disks) can be significantly slower than for physical machines.
Do you recommend running MapReduce on real-time transaction data to perform aggregates to eventually slice/dice the data downstream? Hadoop/MapReduce works best for batch processing of data. If you want to do batch type of processing on real-time transaction data, then yes, MapReduce can be used.
We intend to implement a Data Warehouse, the source data is stored in many logical databases. Is Hadoop an appropriate technology to keep the warehouse up to date? If you want to do processing on the data in parallel then Hadoop is a good fit. If you are just accessing multiple logical databases, then there are several other technologies to do that.
Does ODC will read the datanode of HDFS Yes, it will read files on HDFS datanodes.
What Oracle Database versions are compatible with the ODC for HDFS? Oracle Database and higher
Does this mean that we have some sort of CREATE TABLE .. STORAGE HDFS DDL syntax in Oracle now? Yes, using the ORACLE_LOADER access driver with external tables you can create an external table that points to data on HDFS.
Can you update Hadoop data using the Direct Connector for HDFS or is it read only? Currently you cannot update data using the Oracle Direct Connector for HDFS
Is Oracle Direct Connector using Hadoop/Mapreduce internally while accessing data in HDFS? No. Oracle Direct Connector for HDFS uses a Hadoop client to access the data, but does not perform a MapReduce job. Oracle Loader for Hadoop runs as a MapReduce job.
Do we have documentation or training about using ODI with Oracle Loader for Hadoop, and where? Documentation is available here. Watch the Big Data Connectors OTN page for training and other upcoming material.
I thought ODI only works for PIG ODI uses Hive in the ODI Application Adapters for Hadoop.
Once files / data is loaded into HDFS, how does one UPD/DEL? HDFS is "write-once" file system. You can write new files, and delete files, but cannot update/data within files. For those operations you have to use a database.
Will tihs support my DB2 Database?? No, the Oracle Big Data Connectors will not.
How does the functionality of the Oracle Direct Connector/External Table solution differ from HIVE functionality? Hive is a SQL layer in Hadoop to access data on HDFS. Oracle Direct Connector for HDFS enables you to access data on HDFS using SQL from Oracle Database. The data can now be joined with other tables in the database, imported into the database, etc.
If HIVE provides a SQL interface to HDFS, do I still need Oracle NoSQL Hive and Oracle NoSQL Database address different application needs. Refer to the Acquiring Big Data session for more details.
How does HDFS map to Oracle Directory on Database server for External Table to work? How about RAC, which node it points to? URIs that point to the datafiles on HDFS are published to an Oracle Database directory that the external tables will point to
Can RevolutionAnalytics be used on Oracle Big data Appliance? Yes, the linux version.
What language skills are required to understand and administer NOsql database? Java
as the nodes on BDA is really beefy , what happens whe a single node fail, the impact and rebalance could be really impacting , right? Data are mirrored across different nodes, so when a node fail, data will still be available. In cases were data needs to be shuffle, we overcome it by using InifiniBand as the interconnect
The Oracle Big Data Machine: Does it run only Hadoop or also run NoSQL? You can run NoSQL Database on the same nodes as HDFS. So they can both share the Big Data Appliance if you want to.
In BigData Appliance data is kept in NoSql or HDSF? or can we decide what to use? You can use NoSQL or HDFS or both at the same time.
Can we know how to organize data on Oracle NoSQL Database You use a Java API to access NoSQL Database. There is also an API to enable access from Hadoop, moving NoSQL data into HDFS where you can use MapReduce.
Can the connector/loader also be used with an Oracle DB on Exadata? Yes.Also, when used with Exadata it benefits from Infiniband connectivity between BDA and Exadata.
How do both Hadoop and NoSQL fit into Oracle's big data initiative? Are they competing technologies? If not and they are complementary, how do they work together? Complementary. Did you see the previous presentation which showed when to use NoSQL and when to use HDFS?
What is the minimum configuration price for BIG Data platform Check
What is Hbase Hbase is an Apache NoSQL database that comes with Cloudera's Distribution of Hadoop.
Are there any plans to support Hadoop solutions like MapR? No.  
Is there a similar tool to Oracle Loader for Hadoop that can be used for MySQL databases? Look at sqooop - an open source project
Do you see no value for the flash tier on your Big Data Appliance? Based on our testing, we don't see flash speeding up BDA at the moment. We will look into this more, but it's not effective at the moment.
For the big data connectors, are you licensing per processor on the Oracle database side or the Hadoop side...and if so, by each Hadoop node (data, edge, name)? Oracle Big Data Connectors are licensed based on the Hadoop number of processors in the Hadoop cluster.Licensing is based on all the processors in the Hadoop cluster or Big Data Appliance, not on the number of nodes or the number of database processors.
What type of connectivity exists to load NoSQL into a database, similar to the Hadoop connector? You can export NoSQL data to HDFS and use MapReduce or Oracle Big Data Connectors.
With the purchase of Bigdata appliance, do we get Oracle Bigdata connector automatically No, the Big Data Connectors is a package that is sold separately. Oracle licenses and supports Big Data Connectors on both BDA and on customer Hadoop system as long as the Hadoop and database versions are certified with Big Data Connectors.
Is there any MapReduce capability for NoSQL? You can export NoSQL data to HDFS and use MapReduce that way.
Yes, I saw the previous presentation. So with Big Data Appliance, Oracle is providing both alternatives (Hadoop and NoSQL) and the user decides which best suits his use case, and can use both for different applications? Yes. Oracle BDA offers a fully supported license to Cloudera's Distribution including Apache Hadoop and Cloudera Manager as well as the Community Edition of Oracle NoSQL DB.The enterprise version of Oracle NoSQL DB is available under separate license.The user can decide which best suits his use case and use whichever is preferred or both.
Do you need Java skills to do Big Data project with Oracle solutions? People generally, not always, use Java for MapReduce programming.
How much does hadoop connect cost? Oracle Direct Connector for HDFS is a feature of Oracle Big Data Connectors.Check for details
What is the difference with the oracle HDFS database and the HDAM that has to do with IMS hieracrchical DBMS? These are very different animals.HDAM (Hierarchical Data Access Method) is a Hierachical indexed access method for a database where an index or pointer points to various segments of the file. The segments point to one another and are interdependent. Apache HDFS (Hadoop Distributed File System) is not a database and has no indexed.It is a write once read many environment (not updates) that creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster for performance and availability purposes.There is no interdependency across HDFS blocks – they do not point to one another.

Analyzing Big Data

Question Response
Support for PMML? Oracle Advance Analytics option supports exporting decision tree models and importing linear regression models.
Does the Exalytics component of the Oracle big data platform basically use machine-learning? We have pushed machine learning into Oracle Database, so Exalytics can access such functionality from there. This can be done from SQL or R.
Is there a demo of interactive visual analytics? Yes, you can see general OBIEE demos, but we also have an integration of ORE embedded R execution with OBIEE and BIPublisher.
Can bespoke modelling or visualisation function in R Base run on the Oracle-R enterprise? Oracle R Enterprise does support leveraging R graphics/visualization capabilities.
Do you offer any data virtualization tool that would let us aggregate data across relational, hdfs, NoSQL etc.? The Oracle Connector for HDFS allows you to access data in BDA - exposing that data as an external table. This allows you to easily combine data in Oracle Database with data that is in HDFS.
How do analytics scale to handle full weblogs, customer DB, POS, etc. when using complex algorithms (eg. machine learning) That is the power of the Big Data Appliance. It is an integrated prebuilt/configured/high performance hadoop cluster that pushes the processing across nodes. The Oracle R Connector for Hadoop allows you to push sophisticated analytics across these nodes as well. The scalability and performance is outstanding.
What is the possible performance impact when in database analytics is performed against large DBs one of the unique capabilities of Exadata is the ability to push analytics into the storage nodes - greatly improving performance. Data does not need to be pulled into the database processors on the nodes - it is processed in the store cells.
If you are allowed to say, how do you perform the text-mining on the social-posts example? Are you running tweets and fb posts through a natural language parser like the Stanford NLP? There are different ways of doing this. You can perform text mining in the Big Data Appliance - using R or other libraries (i.e. part of MapReduce processing). You can also use the Oracle Connector for HDFS to give the Oracle Database access to data in Hadoop - and then use Oracle Text mining to do that.
Are you using the hidden markov model for predictive analytics? Oracle R Enterprise enables the use of HMMs in the database using the embedded R engine.
Is R integrated into OBIA Oracle R Enterprise is part of Oracle Advanced Analytics option
Can we use traditional BI platform for analyzing it? Yes, you can still use your traditional BI tools to look at big data. We will touch on that at the end of this session.
Does Oracle provide analytics as a professional services? What are your capabilities in this area? Best to contact your local Oracle office so they can give you an appropriately tailored answer.
Can exalytics run independently of exadata? Yes. You can use either separately or both together. Of course, there is good integration between the two, but Exalytics can also be used in other deployments without Exadata or Oracle Database
I would like to see a map or vizio chart starting with the Internet at one end and an Oracle db at the other to see all of the required software/packages and languages in between. It's pretty hard to draw a graph of data flow for such generic definition of the problem.But let's assume that you are running website and collecting weblogs from all the users going to the website warehousing it in your RDBMS. Then you can use ORE or ORCH to analyse the data. You will need to install R engine and ORE packages. R can be downloaded for Oracle's public yum repository and ORE is available from OTM. R must be installed on your client machine and ORE must be installed in client and RDBMS server sides. Then you would be able to run data analysis at the RDBMS side close to the data. Another path is to use ORCH but it works with BDA only right now. Going this way the data will be pulled into BDA and store on HDFS and then R analysis can be run on a Hadoop cluster using MapReduce technics. ORCH is available from Oracle's OTN and with BDA image.

Conquering Big Data

Question Response
I would like to see a chart showing input from the Internet all the way through to an Oracle database showing me all software/programs and languages required between the two. We have heard about Hadoop, Exadata, the BDA, R language, SQL, the noSQL, etc Of those that you mention, you'll see a holistic illustration in this session.
What is definition of low density data? It refers to large volumes of very detailed data - the detailed entries themselves do not provide much value. But the aggregations and summarized patterns and trends from such detail can provide tremendous value for the business.
How does Oracle RAC fit into all of this? Oracle RAC is as relevant as it has been in the initial data capture and data warehouse in the analyze phase. The Oracle Big Data Appliance is a hardware cluster designed for parallel processing and filtering low density data (like weblogs), and then the higher value results are transfered through a high speed connection to Oracle Exadata, or a clustered Oracle Database for additional processing.
What is DBA's role in big data world? Great question!   There are many areas of big data operations, from new, clustered hardware focused on parallel processing to transforming that data into relational data and introducing it into your BI analytical dashboards, reporting, and larger platform.Big Data does include new development, operational,transformation, and analyticaltechnologies as well as new technology role.So, the DBA role in a big data world means expanded responsibilities and skills. It's applying your knowledge and experience in data storage and retrieve to new set of requirements.And, it's about learning lots of new stuff.
Are any of these technologies being leveraged within oracle corp for any of your big data use cases? if so, can you provide some details. Yes. The big data use cases discussed in this session include two views: conceptual view (capability-based) and logical view (maps Oracle tools and products to these capability requirements).
What measures can be put in place for data quality? Do we develop some kind of scale to measure the quality of data based on volume and diversity of source for unstructured data? The measures you put in place for data quality depend on the specific scenario and your requirement. Think in terms of the balance and requirements for data accuracy (for risk and fraud scenario) and/vs. not to filter out too much valuable information (sentiment analysis, for example). The tactics of applying data governance is very different in this new paradigm.
Can you explain the streaming component as shown under "Analyze"? Yes. Big data processing is not exclusively batch-based using Hadoop. Other technologies such as NoSQL and Complex Event Processing (CEP) engine can be used to to leverage big data in real time.
How is Oracle CEP different than the Oracle BAM (business activity monitoring) product? Oracle CEP is part of Oracle Event Driven Architecture (EDA) suite and is used for processing streams of high-volume event and transaction data in real time to detect patterns and raise flags. Oracle BAM is a real-time dashboard product to manage and monitor business activities.
Can Oracle provide guidance to our architecture review board around best practices and 1st steps we can take? Absolutely! We look forward to applying architecture best practices and reference architecture to help you with your big data initiatives.
How much is the Oracle Big Data Appliance? You can find information about the pricing of the Oracle Big Data Appliance on the Engineered Systems price list.
So is high density data, summarized data then? Yes.In a MapReduce process, the goes through parallel processes to filter data based on critiera.In that process, the data can also be counted and summarized.
What is the difference between ODS and DW/DM? The definition of an Operational Data Store, Data Warehouse and Data Mart have not changed. Big Data provides a new source of data to correlate and analyze, we are now not only analyzing transactions, but interactions.The filtered results from this new source is expected to be added to ODS / DW / DM.
Master Data Management, Data Quality play bigger role during organize as well in a Bigdata architecture , I do not see that? As filtered data is migrated to the data warehouse, it is important that there is alignment along master data elements.Further, it is important that other data quality rules are applied to ensure usability. The Oracle Information Architecture Framework describes the value of these architecture strategies.
Does R run on the exalytics version of the times ten db? Oracle R Enterprise runs in the Oracle Database.
Where can I get more information on the products mentioned? Go to for more information on the big data-specific products.
Are there any data governance papers for big data The Oracle Enterprise Architecture group has authored a white paper in data governance for traditional data realms, but it not yet enhanced for big data.
Where we can download the component software of big data Start at From there you can get information on the specific products and even download NoSQL Database and Big Data Connectors from OTN.
Is real-time data necessarily sparse or low density or they are not correlated like that? Depends on what kind of data it is. High frequency sensor information could be pretty low density, while an online profile use to customize the user experience for a website would typically be of higher density.
Can Oracle Architects advise us on Hadoop configurations and other open source tools? Yes, they can. Please contact your local office.
In all the session the big data focus has been on health care, sales and marketing, social media. How can we use big data with logistics? We picked those industries as examples. But big data certainly applies to logistics. One example would be analyzing trends with package delivery or location. What correlates with delays, for example. For more consumer-facing logistics operations, use of social media to understand what your customers are thinking about service is another.
Can the big data appliance run other open source databases like Cassandra and HBase? Hbase is part of the Hadoop distribution included with the product. You can run other database on the Big Data Appliance.
CEP engine, what does CEP stand for? Complex Event Processing. Oracle CEP is part of the Fusion Middleware suite.
How does times ten fit into all this? TimesTen is part of the Exalytics solution - it is used to cache frequently accessed data to speed up response time. In general, TimesTen is used for those applications that need acceleration (Exalytics are Oracle Billing and Revenue Management are two Oracle examples where TimesTen is used).
Is there any Virtual Environment where we can get a sense of Big Data using our own data for building a POC ? Yes.Please contact your Oracle sales representative.
Is the new release of ODI include the ODI Adapter for Hadoop? No.But, the "Oracle Data Integrator Application Adapter for Hadoop" is a knowledge module and plugs into the latest version of ODI. It is licensed separately as an Oracle Big Data Connector. See here for more information.
OTN Cloud Platform RHS banner