Oracle Cloud Infrastructure (OCI) Data Flow is a fully managed Apache Spark service that performs processing tasks on extremely large datasets—without infrastructure to deploy or manage. Developers can also use Spark Streaming to perform cloud ETL on their continuously produced streaming data. This enables rapid application delivery because developers can focus on app development, not infrastructure management.
Watch the Oracle Developer Live Event and see how to utilize Data Integration and Data Flow to optimize how data used.
Learn how how Data Flow makes running Spark applications easy, secure, and simple.
OCI Data Flow handles infrastructure provisioning, network setup, and teardown when Spark jobs are complete. Storage and security are also managed, which means less work is required for creating and managing Spark applications for big data analysis.
With OCI Data Flow, there are no clusters to install, patch, or upgrade, which saves time and operational costs for projects.
OCI Data Flow runs each Spark job in private dedicated resources, eliminating the need for upfront capacity planning.
With OCI Data Flow, IT only needs to pay for the infrastructure resources that Spark jobs use while they are running.
Spark Streaming with zero management, automatic fault-tolerance, and automatic patching.
With Spark Streaming support, you gain capabilities for continuous retrieval and continuous availability of processed data. OCI Data Flow handles the heavy lifting of stream processing with Spark, along with the ability to perform machine learning on streaming data using MLLib. OCI Data Flow supports Oracle Cloud Infrastructure (OCI) Object Storage and any Kafka-compatible streaming source, including Oracle Cloud Infrastructure (OCI) Streaming as data sources and sinks.
Spark handles late-arriving data due to outages and can catch up backlogged data over time with watermarking—a Spark feature that maintains, stores, and then aggregates late data—without needing to manually restart the job. OCI Data Flow automatically restarts your application when possible and your application can simply continue from the last checkpoint.
OCI Data Flow streaming applications can use cloud native authentication via resource principals so applications can run longer than 24 hours.
Leverage unmatched security from Oracle Cloud Infrastructure. Authentication, isolation, and all other critical points are addressed. Protect business-critical data with the highest levels of security.
OCI Data Flow makes native use of Oracle Cloud's Identity and Access Management system for controlled data and access, so data stays secure.
Set quotas and limits to manage resources available to OCI Data Flow and control costs.
OCI Data Flow simplifies common operational tasks like log management and access to operational UIs, freeing up developer time to focus on building applications.
OCI Data Flow makes it easy to see what Spark users are doing by aggregating operational information into a single, searchable UI.
Tracking down logs and tools to troubleshoot a Spark job can take hours—but not with a consolidated view of log output, Spark history server, and more.
Sort, search, and filter to investigate historic applications to better address expensive jobs and avoid unnecessary expenditures.
Administrators can easily discover and stop live Spark jobs that are running for too long or consuming too many resources and driving up costs.
Big data ecosystems require many moving parts and integrations—but OCI Data Flow is compatible with existing Spark investments and big data services, making it easy to manage the service and deliver its results where they’re needed.
Migrate existing Spark applications from Hadoop or other big data services.
Automatically—and securely—capture and store Spark jobs' output, and then access them through the UI or REST APIs to bring make analytics available.
All aspects of OCI Data Flow can be managed using simple REST APIs, from application creation to execution to accessing results of Spark jobs.
With Oracle Cloud Infrastructure Data Flow, we met client SLAs by reducing the time needed for data processing by 75% and by reducing the cost by more than 300%.
Arun Nimmala, Delivery Director Global Services Integration and Analytics Architecture, Oracle
NVIDIA RAPIDS Accelerator for Apache Spark in OCI Data Flow is supported to help accelerate data science, machine learning, and AI workflows.
Data Flow manages ETL offload by overseeing Spark jobs, optimizing cost, and freeing up capacity.
Data Flow's output management capabilities optimize the ability to query data using Spark.
Resources can be automatically shifted to handle unpredictable jobs and lower costs. A dashboard provides a view of usage and budget for future planning purposes.
Spark and machine learning developers can use Spark’s machine learning library and run models more efficiently using Data Flow.
Gain Spark Streaming support with zero management and automatic fault tolerance with end-to-end, exactly once guarantees, and automatic patching.
Sign up for Oracle Cloud account and try the Data Flow service for free.
Learn about Oracle Cloud Infrastructure Data Flow.
Experience the live product hands-on for free.
Talk to a team member about Oracle Cloud Infrastructure Data Flow.