No results found

Your search did not match any results.

We suggest you try the following to help find what you're looking for:

  • Check the spelling of your keyword search.
  • Use synonyms for the keyword you typed, for example, try "application" instead of "software."
  • Start a new search.
Contact Us Sign in to Oracle Cloud

Data Flow

Oracle Cloud Infrastructure (OCI) Data Flow is a fully managed Apache Spark service to perform processing tasks on extremely large data sets without infrastructure to deploy or manage. This enables rapid application delivery because developers can focus on app development, not infrastructure management.

Integrating and Preparing Data for Data Science

Watch the Oracle Developer Live Event and see how to utilize Data Integration and Data Flow to optimize how data used.

Try an Oracle Cloud Data Flow workshop

Learn how how Data Flow makes running Spark applications easy, secure, and simple.

Data Flow features

Managed infrastructure

OCI Data Flow handles infrastructure provisioning, network setup, and teardown when Spark jobs are complete. Storage and security are also managed, which means less work is required for creating and managing Spark applications for big data analysis.

Easier cluster management

With OCI Data Flow, there are no clusters to install, patch, or upgrade, which saves time and operational costs for projects.

Simplified capacity planning

OCI Data Flow runs each Spark job in private dedicated resources, eliminating the need for upfront capacity planning.

Lower costs

With OCI Data Flow, IT only needs to pay for the infrastructure resources that Spark jobs use while they are running.

Cloud native security and governance

Leverage unmatched security from Oracle Cloud Infrastructure. Authentication, isolation, and all other critical points are addressed. Protect business-critical data with the highest levels of security.

Granular security

OCI Data Flow makes native use of Oracle Cloud's Identity and Access Management system for controlled data and access, so data stays secure.

Managed resources

Set quotas and limits to manage resources available to OCI Data Flow and control costs.

Simplified operations

OCI Data Flow simplifies common operational tasks like log management and access to operational UIs, freeing up developer time to focus on building applications.

Increased visibility

OCI Data Flow makes it easy to see what Spark users are doing by aggregating operational information into a single, searchable UI.

Simple debugging and diagnostics

Tracking down logs and tools to troubleshoot a Spark job can take hours—but not with a consolidated view of log output, Spark history server, and more.

Avoid future costs

Sort, search, and filter to investigate historic applications to better address expensive jobs and avoid unnecessary expenditures.

Manage runaway Spark jobs

Administrators can easily discover and stop live Spark jobs that are running for too long or consuming too many resources and driving up costs.

Simplified development

Big data ecosystems require many moving parts and integrations—but OCI Data Flow is compatible with existing Spark investments and big data services, making it easy to manage the service and deliver its results where they’re needed.

Compatible with existing applications

Migrate existing Spark applications from Hadoop or other big data services.

Secure output management

Automatically—and securely—capture and store Spark jobs' output, and then access them through the UI or REST APIs to bring make analytics available.

Control with REST APIs

All aspects of OCI Data Flow can be managed using simple REST APIs, from application creation to execution to accessing results of Spark jobs.

Oracle Cloud Infrastructure Data Flow Reduces Cost by 75%

With Oracle Cloud Infrastructure Data Flow, we met client SLAs by reducing the time needed for data processing by 75% and by reducing the cost by more than 300%.

Arun Nimmala, Delivery Director Global Services Integration and Analytics Architecture, Oracle

OCI Data Flow key benefits

  • ETL offload

    Oracle Cloud Infrastructure Data Flow manages ETL offload by overseeing Spark jobs, optimizing cost, and freeing up capacity.

    Read article

  • Active archive

    OCI Data Flow's output management capabilities optimize the ability to query data using Spark.

    Read article

  • Unpredictable workloads

    Resources can be automatically shifted to handle unpredictable jobs and lower costs. A dashboard provides a view of usage and budget for future planning purposes.

    Read article

  • Machine learning model training

    Spark and machine learning developers can use Spark’s machine learning library and run models using the benefits of OCI Data Flow.

Related cloud products

Oracle Cloud Infrastructure Data Science

End-to-end machine learning

Oracle Cloud Infrastructure Data Catalog

Self-service data discovery

Oracle Autonomous Data Warehouse

Cloud data warehouse service

Oracle Cloud Infrastructure Object Storage

Build your data lake

Get started with OCI Data Flow

Signup for free trial

Sign up for Oracle Cloud account and try the Data Flow service for free.

Get Training

Learn about Oracle Cloud Infrastructure Data Flow.

Hands-on lab

Experience the live product hands-on for free.

Contact sales

Talk to a team member about Oracle Cloud Infrastructure Data Flow.