Oracle R Advanced Analytics for Hadoop (ORAAH) is one of the components in the Oracle Big Data Connectors software suite, an option to the Oracle Big Data Appliance. At its core, ORAAH provides an R interface for manipulating data stored in HDFS, using both HIVE transparency capabilities and mapping HDFS as direct input into Machine Learning algorithms.
The newly released Oracle R Advanced Analytics for Hadoop 2.5.0 includes two new algorithm implementations that can take advantage of an Apache Spark cluster for a significant performance gains on Model Build and Scoring time. These algorithms are a redesigned version of the Multi-Layer Perceptron Neural Networks (orch.neural), and a brand new implementation of a Logistic Regression model (orch.glm2).
The Platform also allows for writing mapper and reducer functions in R, where open source CRAN packages can be leveraged. Users can pass R objects from the client R object space to their mapper and reducer functions, as well as test MapReduce jobs locally at their client R engine without changing any code, just by switching a system flag. This makes it easy to debug code before unleashing it on the full Hadoop cluster.
If parallel distributed map-reduce programming isn't your strength, ORAAH also allows you to manipulate Hive data using the same type of transparency provided by Oracle R Enteprise, but for use on top of Hive tables. So just as Oracle R Enterprise maps data.frame functions to Oracle SQL, Oracle R Advanced Analytics for Hadoop uses the same abstraction to map those data.frame functions to HiveQL.
In addition, ORAAH provides ten prepackaged Map-Reduce advanced analytics algorithms including: Logistic Regression (Spark-based), Multi-Layer Perceptron Feed Forward Neural Networks (Spark and Map-Reduce versions), Generalized Linear Models (GLM), Linear Regression models, Principal Component Analysis (PCA), k-Means clustering, Non-negative Matrix Factorization, Low-Rank Matrix Factorization (for collaborative filtering), Correlation and Covariance matrix computations.
So even if you’re not comfortable turning serial algorithms into parallel distributed algorithms in map-reduce, you can get the benefit of the Hadoop cluster using our high-level R interfaces.