MySQL HeatWave Lakehouse Features

Unified query engine for non-MySQL and MySQL workloads

Query data in object storage in various file formats such as CSV, Parquet, Avro, export files from other databases, transactional data in MySQL databases, or a combination using standard SQL syntax. Data isn’t copied to the MySQL database. Instead, the query processing is done entirely in the HeatWave engine, so you can use HeatWave for non-MySQL workloads and MySQL-compatible workloads alike. When loaded into the HeatWave cluster, data from any source is automatically transformed into a single optimized internal format. As a result, querying the data in object storage is as fast as querying the databases—an industry first.

Scale-out architecture

Oracle MySQL HeatWave’s unrivaled performance is a result of its scale-out architecture, which enables massive parallelism to provision the cluster, load data, and process queries with up to 512 nodes. Each HeatWave node within a cluster and each core within a node can process partitioned data in parallel, including parallel scans, joins, group-by, aggregation, and top-k processing. The algorithms are designed to overlap compute time with the communication of data across nodes, which helps achieve high scalability.

Machine learning–powered automation with MySQL Autopilot

MySQL Autopilot provides workload-aware automation for MySQL HeatWave powered by machine learning (ML). MySQL Autopilot capabilities, such as auto provisioning, auto query plan improvement (which learns various runtime statistics from past query executions to improve the execution plan for future queries), and auto parallel loading, have been enhanced for MySQL HeatWave Lakehouse. Additional capabilities for HeatWave Lakehouse include the following:

  • Auto schema inference automatically infers the mapping of file data to the corresponding schema definition for all supported file types, including CSV. As a result, you don’t need to manually define and update the schema mapping of files, saving time and effort.
  • Adaptive data sampling intelligently samples files in object storage to derive the information that enables MySQL Autopilot’s predictions for automation. Using adaptive data sampling, MySQL Autopilot can scan and make predictions, such as schema mapping on a 400 TB file in less than one minute.
  • Adaptive data flow lets MySQL HeatWave Lakehouse dynamically adapt to the performance of the underlying object store in any region to improve overall performance and availability.
  • Adaptive query optimization uses various statistics to adjust data structures and system resources after query execution has started, independently optimizing query execution for each node based on actual data distribution at runtime. This helps improve the performance of ad hoc queries by up to 25%.

Built-in machine learning

With HeatWave AutoML, you can use data in object storage, the database, or both to build, train, deploy, and explain ML models. You don’t need to move the data to a separate ML cloud service, or be an ML expert. HeatWave AutoML automates the machine learning pipeline, including algorithm selection, intelligent data sampling for model training, feature selection, and hyperparameter optimization—saving data analysts significant time and effort. HeatWave AutoML supports anomaly detection, forecasting, classification, regression, and recommender system tasks, even on text columns. You can use HeatWave AutoML at no additional cost.

Generative AI with MySQL HeatWave vector store

Using large language models (LLMs), applications can interact with HeatWave Lakehouse in natural language. Currently in private preview, the vector store will enable you to leverage the power of LLMs in combination with your proprietary data to get more relevant and accurate answers than those derived from models trained on only public data. With generative AI and vector store capabilities, you’ll interact with MySQL HeatWave using natural language and efficiently search proprietary documents in various file formats in HeatWave Lakehouse.

Highly available, fully managed database service

Tasks such as high-availability management, patching, upgrades, and backups are automated with a fully managed database service. Data loaded into the HeatWave cluster is automatically recovered in case of an unexpected compute node failure, without retransformation from external data formats.

Secure access control

With access control mechanisms such as Oracle Cloud Infrastructure (OCI) resource principal authentication or pre-authenticated requests, you can have full control over access to data lake sources. When running HeatWave Lakehouse in AWS, you can define identity and access management roles and policies to grant access only to specific S3 data.