Oracle Cloud Infrastructure (OCI) Data Flow is a fully managed Apache Spark service to perform processing tasks on extremely large data sets without infrastructure to deploy or manage. This enables rapid application delivery because developers can focus on app development, not infrastructure management.
OCI Data Flow handles infrastructure provisioning, network setup, and teardown when Spark jobs are complete. Storage and security are also managed, which means less work is required for creating and managing Spark applications for big data analysis.
With OCI Data Flow, there are no clusters to install, patch, or upgrade, which saves time and operational costs for projects.
OCI Data Flow runs each Spark job in private dedicated resources, eliminating the need for upfront capacity planning.
With OCI Data Flow, IT only needs to pay for the infrastructure resources that Spark jobs use while they are running.
Leverage unmatched security from Oracle Cloud Infrastructure. Authentication, isolation, and all other critical points are addressed. Protect business-critical data with the highest levels of security.
OCI Data Flow makes native use of Oracle Cloud's Identity and Access Management system for controlled data and access, so data stays secure.
Set quotas and limits to manage resources available to OCI Data Flow and control costs.
OCI Data Flow simplifies common operational tasks like log management and access to operational UIs, freeing up developer time to focus on building applications.
OCI Data Flow makes it easy to see what Spark users are doing by aggregating operational information into a single, searchable UI.
Tracking down logs and tools to troubleshoot a Spark job can take hours—but not with a consolidated view of log output, Spark history server, and more.
Sort, search, and filter to investigate historic applications to better address expensive jobs and avoid unnecessary expenditures.
Administrators can easily discover and stop live Spark jobs that are running for too long or consuming too many resources and driving up costs.
Big data ecosystems require many moving parts and integrations—but OCI Data Flow is compatible with existing Spark investments and big data services, making it easy to manage the service and deliver its results where they’re needed.
Migrate existing Spark applications from Hadoop or other big data services.
Automatically—and securely—capture and store Spark jobs' output, and then access them through the UI or REST APIs to bring make analytics available.
All aspects of OCI Data Flow can be managed using simple REST APIs, from application creation to execution to accessing results of Spark jobs.
With Oracle Cloud Infrastructure Data Flow, we met client SLAs by reducing the time needed for data processing by 75% and by reducing the cost by more than 300%.
Arun Nimmala, Delivery Director Global Services Integration and Analytics Architecture, Oracle
Oracle Cloud Infrastructure Data Flow manages ETL offload by overseeing Spark jobs, optimizing cost, and freeing up capacity.
OCI Data Flow's output management capabilities optimize the ability to query data using Spark.
Resources can be automatically shifted to handle unpredictable jobs and lower costs. A dashboard provides a view of usage and budget for future planning purposes.
Spark and machine learning developers can use Spark’s machine learning library and run models using the benefits of OCI Data Flow.