Data Flow

Oracle Cloud Infrastructure (OCI) Data Flow is a fully managed Apache Spark service that performs processing tasks on extremely large datasets—without infrastructure to deploy or manage. Developers can also use Spark Streaming to perform cloud ETL on their continuously produced streaming data. This enables rapid application delivery because developers can focus on app development, not infrastructure management.

OCI Data Flow demo (1:30)
Ronin Ronin and Oracle improve cancer care and deliver on an AI bill of rights

Discover how Ronin leveraged OCI Data Flow with Apache Spark to build a future where every clinical decision is rooted in data, personalized for a given individual, and rendered efficiently with confidence.

Integrating and Preparing Data for Data Science

Watch the Oracle Developer Live Event and see how to utilize Data Integration and Data Flow to optimize how data used.

Try an Oracle Cloud Data Flow workshop

Learn how how Data Flow makes running Spark applications easy, secure, and simple.

Data Flow features

Managed infrastructure

OCI Data Flow handles infrastructure provisioning, network setup, and teardown when Spark jobs are complete. Storage and security are also managed, which means less work is required for creating and managing Spark applications for big data analysis.

Easier cluster management

With OCI Data Flow, there are no clusters to install, patch, or upgrade, which saves time and operational costs for projects.

Simplified capacity planning

OCI Data Flow runs each Spark job in private dedicated resources, eliminating the need for upfront capacity planning.

Lower costs

With OCI Data Flow, IT only needs to pay for the infrastructure resources that Spark jobs use while they are running.

Advanced streaming support capabilities

Spark Streaming with zero management, automatic fault-tolerance, and automatic patching.

Enable continuous processing

With Spark Streaming support, you gain capabilities for continuous retrieval and continuous availability of processed data. OCI Data Flow handles the heavy lifting of stream processing with Spark, along with the ability to perform machine learning on streaming data using MLLib. OCI Data Flow supports Oracle Cloud Infrastructure (OCI) Object Storage and any Kafka-compatible streaming source, including Oracle Cloud Infrastructure (OCI) Streaming as data sources and sinks.

Automatic fault tolerance

Spark handles late-arriving data due to outages and can catch up backlogged data over time with watermarking—a Spark feature that maintains, stores, and then aggregates late data—without needing to manually restart the job. OCI Data Flow automatically restarts your application when possible and your application can simply continue from the last checkpoint.

Cloud native authentication

OCI Data Flow streaming applications can use cloud native authentication via resource principals so applications can run longer than 24 hours.

Cloud native security and governance

Leverage unmatched security from Oracle Cloud Infrastructure. Authentication, isolation, and all other critical points are addressed. Protect business-critical data with the highest levels of security.

Granular security

OCI Data Flow makes native use of Oracle Cloud's Identity and Access Management system for controlled data and access, so data stays secure.

Managed resources

Set quotas and limits to manage resources available to OCI Data Flow and control costs.

Simplified operations

OCI Data Flow simplifies common operational tasks like log management and access to operational UIs, freeing up developer time to focus on building applications.

Increased visibility

OCI Data Flow makes it easy to see what Spark users are doing by aggregating operational information into a single, searchable UI.

Simple debugging and diagnostics

Tracking down logs and tools to troubleshoot a Spark job can take hours—but not with a consolidated view of log output, Spark history server, and more.

Avoid future costs

Sort, search, and filter to investigate historic applications to better address expensive jobs and avoid unnecessary expenditures.

Manage runaway Spark jobs

Administrators can easily discover and stop live Spark jobs that are running for too long or consuming too many resources and driving up costs.

Simplified development

Big data ecosystems require many moving parts and integrations—but OCI Data Flow is compatible with existing Spark investments and big data services, making it easy to manage the service and deliver its results where they’re needed.

Compatible with existing applications

Migrate existing Spark applications from Hadoop or other big data services.

Secure output management

Automatically—and securely—capture and store Spark jobs' output, and then access them through the UI or REST APIs to bring make analytics available.

Control with REST APIs

All aspects of OCI Data Flow can be managed using simple REST APIs, from application creation to execution to accessing results of Spark jobs.

Oracle Cloud Infrastructure Data Flow Reduces Cost by 75%

With Oracle Cloud Infrastructure Data Flow, we met client SLAs by reducing the time needed for data processing by 75% and by reducing the cost by more than 300%.

Arun Nimmala, Delivery Director Global Services Integration and Analytics Architecture, Oracle

OCI Data Flow key benefits

  • Accelerate workflows with NVIDIA RAPIDS

    NVIDIA RAPIDS Accelerator for Apache Spark in OCI Data Flow is supported to help accelerate data science, machine learning, and AI workflows.

    ETL offload

    Data Flow manages ETL offload by overseeing Spark jobs, optimizing cost, and freeing up capacity.

  • Active archive

    Data Flow's output management capabilities optimize the ability to query data using Spark.

  • Unpredictable workloads

    Resources can be automatically shifted to handle unpredictable jobs and lower costs. A dashboard provides a view of usage and budget for future planning purposes.

  • Machine learning model training

    Spark and machine learning developers can use Spark’s machine learning library and run models more efficiently using Data Flow.

  • Spark Streaming

    Gain Spark Streaming support with zero management and automatic fault tolerance with end-to-end, exactly once guarantees, and automatic patching.

    Read about some of the above use cases

Related cloud products

Oracle Cloud Infrastructure Data Science

End-to-end machine learning

Oracle Cloud Infrastructure Data Catalog

Self-service data discovery

Oracle Autonomous Data Warehouse

Cloud data warehouse service

Oracle Cloud Infrastructure Object Storage

Build your data lake

Get started with OCI Data Flow

Signup for free trial

Sign up for Oracle Cloud account and try the Data Flow service for free.

Get Training

Learn about Oracle Cloud Infrastructure Data Flow.

Hands-on lab

Experience the live product hands-on for free.

Contact sales

Talk to a team member about Oracle Cloud Infrastructure Data Flow.