Data Lake

A data lake is a repository for structured, semistructured, and unstructured data in any format and size and at any scale that can be analyzed easily. With Oracle Cloud Infrastructure (OCI), you can build a secure, cost-effective, and easy-to-manage data lake. A data lake on OCI is tightly integrated with your preferred data warehouses and analytics as well as with other OCI services, such as data catalog, security, and observability services.

What is a data lake? (0:44)
Any data, any source

Move your data in batches or streams seamlessly to an OCI data lake where it can be analyzed. Leverage OCI Data Integration, OCI GoldenGate, or OCI Streaming to ingest your data and store it in OCI Object Storage.

So long, data silos!

A central data lake on OCI integrates with your preferred tools, including databases such as Oracle Autonomous Data Warehouse, MySQL HeatWave, analytics and machine learning (ML) tools such as Oracle Analytics Cloud, and open source projects such as Apache Spark.

Leverage AI and ML

A comprehensive set of AI and ML services lets you gain new insights from your data, make predictions, lower your operational overhead, and improve customer experience.

Discover and secure your data

Catalog your data and gather insights about your data lake with OCI Data Catalog. Enable query tools and databases to discover and query your data in the object store.

Get early access to OCI Data Lake

Oracle Cloud Infrastructure is launching a fully managed data lake service called OCI Data Lake this year. You can sign up for early access to explore its features and capabilities before it's released to the public.

Why use a data lake on OCI?


Modernize your data lake

A data lake makes it possible to work with more kinds of data, but the time and effort needed to manage it can be disadvantageous. By offering fully managed open source data lake services, OCI provides both lower costs and less management, so you can expect reduced operational costs, improved scalability and security, and the ability to incorporate all of your current data in one place.


Extend your data warehouse

Data warehouses and data marts are crucial to successful businesses. Integrating them with a data lake will increase their value even more. Integration among databases, data warehouses, and a data lake with Oracle means that data can be accessed from multiple locations with a single SQL query. Current applications and tools get transparent access to all data, with no changes and no need to learn new skills.


Utilize advanced analytics for Oracle applications

Data generated by enterprise applications is highly valuable, but it’s rarely fully utilized. A data lake on OCI simplifies access to data from multiple applications and enables sophisticated analysis that can mean the difference between a good quarter or a bad quarter.

Data lake integrated solution on OCI

Centralize your data with an embedded OCI Data Integration experience.

Query any data from any source without replication.

Preintegrated applications for instantaneous time to value.

Catalog and govern with an embedded OCI Data Catalog experience.

Secure data with fine-grained, role-based access control policies.

Oracle data platform unlocks the full potential of your data

  • Combine transactional and analytical data—avoid silos.
  • Leverage Oracle IaaS to Oracle SaaS, or anything in between—select the amount of control desired.
  • Bring any kind of data to the platform—we break the barrier between structured and unstructured data.
  • Explore the power of OCI and its openness to other cloud service providers—we meet you where you are.
  • Use leading Oracle Analytics Cloud reporting or any third-party analytical application—OCI is open.
Oracle data platform overview diagram, description below The diagram shows the Oracle data platform with data sources, data movement services such as integration services, the core of the Oracle modern data platform, and possible outcome and application development services.

Integrate Autonomous Database with data lakes

Oracle Autonomous Database supports integration with data lakes—not just on Oracle Cloud Infrastructure, but also on Amazon Web Services (AWS), Microsoft Azure, Google Cloud, and more. You have the option of loading data into the database or querying the data directly in the source object store. Both approaches use the same tools and APIs to access the data.

This architecture is sometimes referred to as a lakehouse architecture.



Autonomous Database with data lakes diagram, description below The diagram shows an architecture of a data platform leveraging Oracle Autonomous Database, with data sources, Oracle Autonomous Database, and outcomes.

Real-time analytics across all your data with MySQL HeatWave Lakehouse

One MySQL cloud database service for transactions, real-time analytics across data warehouses and data lakes, and machine learning—without the complexity, latency, risks, and cost of ETL duplication.



Autonomous Database with data lakes diagram, description below The diagram shows an architecture of a data platform leveraging Oracle MySQL HeatWave, with data sources, MySQL Heatwave, and outcomes.

Build a data lake with Oracle-managed open source services

Quickly create Hadoop-based or Spark-based data lakes to extend your data warehouses and ensure all data is both easily accessible and managed cost-effectively.



Autonomous Database with data lakes diagram, description below The diagram shows an architecture of a data platform leveraging Oracle-managed open source services, such as Hadoop, Spark, and OpenSearch, with data sources, Oracle open source services at the core, and possible outcomes.

Data lake services from Oracle

Data motion and integration

Connect and extend analytical applications with real-time consistent transactional data, efficient batch loads, and streaming data.

  • OCI Data Integration
    Simplify your complex data extract, transform, and load processes (ETL/E-LT) into data lakes and warehouses for data science and analytics with a no-code data flow designer.
  • Oracle Data Integrator
    Data Integrator provides advanced data migration for extract, transform, and load. Oracle Data Integrator is optimized for Oracle cloud databases as well as on-premises databases.
  • Oracle GoldenGate
    Oracle GoldenGate enables high-availability, real-time data integration, change data capture, data replication, transformations, and verification between operational and analytical enterprise systems.
  • OCI Streaming
    Streaming provides out-of-the-box integrations for hundreds of third-party products across categories such as DevOps, databases, big data, and SaaS applications.

Data lake

Build a data lake using fully managed data services with lower costs and less effort.

  • OCI Data Lake
    Data Lake offers centralized storage and metadata for your structured and unstructured data with unified, fined-grained access control.
  • OCI Object Storage
    Object Storage enables customers to store any type of data in its native format. This is ideal for building modern applications that require scale and flexibility.
  • OCI Data Catalog
    Data Catalog helps data professionals across the organization search, explore, and govern data using an inventory of enterprisewide data assets.
  • OCI Data Flow
    Data Flow is a fully managed Apache Spark service to perform processing tasks on extremely large datasets without infrastructure to deploy or manage. This enables rapid application delivery because developers can focus on app development, not infrastructure management.
  • Oracle Big Data
    Big Data Service is a Hadoop-based data lake service to store and analyze large amounts of raw customer data. A managed service, Oracle Big Data Service comes with a fully integrated stack that includes both open source and Oracle value-added tools that simplify your IT operations.

Data lakehouse

Leverage OCI integration of your data lakes with your preferred data warehouses and uncover new insights.

  • MySQL Heatwave Lakehouse
    MySQL HeatWave Lakehouse transparently connects to data lakes, letting users process and query hundreds of terabytes of data in the object store in a variety of file formats, including CSV, Parquet, and Aurora/Redshift backups.
  • Autonomous Database lakehouse capabilities
    Autonomous Database enables a self-service data lakehouse, allowing users to load or directly query files on all object stores (including OCI, AWS, Azure, and Google Cloud Platform). Integrated machine learning, spatial, text, and graph analytics enable insights without moving data.

AI and machine learning

Gain insights from data with prebuilt AI models, or create your own.

  • OCI AI Services
    AI Services is a collection of services with prebuilt machine learning models that make it easier for developers to apply AI to applications and business operations. The models can be custom-trained for more accurate business results.
  • OCI Data Science
    Rapidly build, train, deploy, and manage machine learning models with a data science service built for teams.
  • Machine Learning in Oracle Database
    Machine Learning in Oracle Database supports data exploration, preparation, and machine learning modeling at scale.
  • MySQL HeatWave AutoML
    MySQL HeatWave AutoML includes everything users need to build, train, deploy, and explain machine learning models within MySQL HeatWave, at no additional cost.
Financial services
Experian accelerates financial inclusivity with a data lakehouse on OCI.
Mining
MineSense achieved 5X faster queries with a lakehouse on OCI.
Advertising
Beso unified data from 23 online sources with a variety of offline sources to build a data lake that will expand to 100 sources.
Sports technology
With a data lakehouse from Oracle, the Seattle Sounders manage 100X more data, generate insights 10X faster, and have reduced database management.

Data lakehouse partner ecosystem

Oracle partner solutions leverage and augment data lakehouses on OCI.

  • Accenture logo
  • Capgemini logo
  • Deloitte logo
  • h2o.ai logo
  • qubix logo
  • Reply Technology logo
  • Sesame Software logo
  • wandisco logo

Informatica is the preferred partner for data integration and governance for data warehouse and lakehouse solutions.

Data lake adoption on Oracle Cloud Infrastructure

See all reference architectures
The Oracle Playbook series

We've compiled the secrets to our people, process, and systems strategy. And we want to share them with you.

Get started with a data lake on OCI

Try Always Free cloud services and get a 30-day trial

Oracle offers a Free Tier with no time limits on a selection of services, including Autonomous Data Warehouse, OCI Compute, and Oracle Storage products, as well as US$300 in free credits to try additional cloud services. Get the details and sign up for your free account today.

  • What's included with Oracle Cloud Free Tier?

    • Always Free
    • 2 autonomous databases, 20 GB each
    • Compute VMs
    • 100 GB block volume
    • 10 GB object storage

Learn with a hands-on lab

The best way to learn is to try it yourself. Try this free data lake workshop, which demonstrates a typical usage scenario and highlights some of the tools you can use to build a data lake.

  • Access the Data Lake using Autonomous Database and Data Catalog

    The labs in this workshop walk you through the steps you need to access a data lake created with Oracle Object Storage buckets by using Oracle Autonomous Database and OCI Data Catalog.

    Start data lake access lab
  • Get Started with Oracle Big Data Service

    Learn how to create and monitor a highly available Hadoop cluster using Big Data Service and OCI. You’ll also add Oracle Cloud SQL to the cluster and access the utility and master node, and learn how to use Cloudera Manager and Hue to access the cluster directly in a web browser.

    Start the data lake lab
  • Learn analytics and machine learning with Red Bull Racing

    Use analytics and machine learning to analyze 70 years of racing data. Find out what makes some races so exciting you can’t look away while others are more predictable.

    Start the data analytics lab
  • Get started with Oracle Cloud Infrastructure Anomaly Detection

    Discover how to use OCI Anomaly Detection to create customized machine learning models. You’ll take data uploaded by users, use a specialized algorithm to train a model, and deploy the model into the cloud environment to detect anomalies.

    Start the anomaly detection lab now

Contact sales

Interested in learning more about a data lake? Let one of our experts help.

  • They can answer questions such as

    • How do I get started with a data lake on Oracle?
    • What can I do with a data lake that I can’t do with a data warehouse?
    • How can my business benefit from a data lake?