Oracle R Advanced Analytics for Hadoop (ORAAH) is one of the components in the Oracle Big Data Software Connectors Suite, an option to the Big Data Appliance. At its core, ORAAH provides an R interface for not only manipulating HDFS data, but writing mapper and reducer functions in R – where you can also leverage open source CRAN packages – and then invoke those Hadoop jobs from R. Users can pass R objects from the client R object space to their mapper and reducer functions, as well as test MapReduce jobs locally at their client R engine without changing any code, just switching a system flag. This makes it easy to debug code before unleashing it on the full Hadoop cluster.
If parallel distributed map-reduce programming isn't your strength, ORAAH also allows you to manipulate Hive data using the same type of transparency provided by Oracle R Enteprise, but for use on top of Hive tables. So just as Oracle R Enterprise maps data.frame functions to Oracle SQL, Oracle R Advanced Analytics for Hadoop uses the same abstraction to map those data.frame functions to HiveQL.
In addition, ORAAH provides eight prepackaged advanced analytics algorithms including: KMeans clustering, linear regression models, principal component analysis or PCA, Non-negative and low rank matrix factorization, correlation and covariance matrix computations, and feed forward neural networks. So even if you’re not comfortable turning serial algorithms into parallel distributed algorithms in map-reduce, you can get the benefit of the Hadoop cluster using our high-level R interface.