Increase MySQL performance by orders of magnitude for analytics and mixed workloads. Query data in object storage. Eliminate the need for a separate analytics database or lakehouse platform; separate machine learning (ML) tools; and extract, transform, and load (ETL) duplication. MySQL HeatWave is available on Oracle Cloud Infrastructure (OCI), Amazon Web Services (AWS), and Microsoft Azure.
Join our session on MySQL HeatWave Lakehouse and generative AI.
MySQL HeatWave is a fully managed database service, powered by the HeatWave in-memory query accelerator. It’s the only cloud service that combines transactions, real-time analytics across data warehouses and data lakes, and machine learning in one MySQL Database—without the complexity, latency, risks, and cost of ETL duplication.
With MySQL HeatWave Lakehouse, customers can query half a petabyte of data in object storage and leverage all the benefits of HeatWave, even when their data is stored outside a MySQL Database. With HeatWave AutoML, developers and data analysts can build, train, deploy, and explain machine learning models in MySQL HeatWave without moving data to a separate machine learning service.
See how to process and query hundreds of terabytes of data in the object store in a variety of file formats, such as CSV, Parquet, and export files from other databases.
Discover the novel techniques that power MySQL HeatWave Lakehouse, letting users process and query half a petabyte of data in memory from object storage.
See how easy it is to use MySQL HeatWave Lakehouse on AWS to analyze queries in real time and perform machine learning on hundreds of terabytes of data from object storage, the database, or both.
See how MySQL Autopilot increases the performance of HeatWave while saving significant time for developers and DBAs.
See how support for generative AI and vector store will let you interact with MySQL HeatWave in natural language and use large language models (LLMs) with proprietary data for better accuracy.
See the query performance and price-performance comparisons from McKnight Consulting Group's 100 TB TPC-H benchmark of MySQL HeatWave versus Snowflake, Amazon Redshift, Databricks, and Google BigQuery.
Migrate to MySQL HeatWave with free step-by-step resources.
HeatWave uses a columnar in-memory representation that facilitates vectorized processing. The data is encoded and compressed prior to being loaded in memory. This compressed and optimized in-memory representation is used for both numeric and string data. This results in significant performance improvements and a reduced memory footprint, translating into reduced costs for customers.
One of the key design points of the HeatWave engine is to massively partition data across a cluster of HeatWave nodes, which can be operated in parallel. This enables high cache hits for analytic operations and provides very good inter-node scalability. Each HeatWave node within a cluster and each core within a node can process partitioned data in parallel, including parallel scans, joins, group-by, aggregation and top-k processing.
Modifications made by OLTP transactions are propagated in real-time to HeatWave and immediately visible for analytics queries. Once users submit a query to the MySQL database, the MySQL query optimizer transparently decides if the query should be offloaded to the HeatWave cluster for accelerated execution. This is based on whether all operators and functions referenced in the query are supported by HeatWave and if the estimated time to process the query with HeatWave is less than with MySQL. If both conditions are met, the query is pushed to HeatWave nodes for processing. Once processed, the results are sent back to the MySQL database node and returned to users.
HeatWave implements state-of-the-art algorithms for distributed in-memory analytic processing. Joins within a partition are processed fast by using vectorized build and probe join kernels. The highly optimized network communication between analytics nodes is achieved by using asynchronous batch I/Os. The algorithms are designed to overlap compute time with communication of data across nodes, which helps achieve high scalability.
MySQL Autopilot automates many of the most important and often challenging aspects of achieving high query performance at scale—including provisioning, data loading, query execution, and failure handling. It uses advanced techniques to sample data, collect statistics on data and queries, and build machine-learning models to model memory usage, network load, and execution time. These machine-learning models are then used by MySQL Autopilot to execute its core capabilities. MySQL Autopilot makes the HeatWave query optimizer increasingly intelligent as more queries are executed, resulting in continually improving system performance over time. MySQL Autopilot also provides capabilities designed to improve the performance and price-performance of OLTP workloads. MySQL Autopilot is available at no additional charge for MySQL HeatWave customers.
When data is loaded from MySQL into HeatWave, a copy of the in-memory representation is made to the scale-out data management layer built on the OCI object store. Changes made to data in MySQL are transparently propagated to this data layer. When an operation requires reloading of data to HeatWave, such as during error recovery, data can be accessed from the HeatWave data layer, in parallel, by multiple HeatWave nodes. This results in a significant improvement in performance. For example, for a 10 TB HeatWave cluster, the time it takes to recover and reload data reduces from 7.5 hours to 4 minutes—an improvement of over 100X.
HeatWave is designed as a MySQL pluggable storage engine, which completely shields all the low-level implementation details from customers. As a result, applications and tools seamlessly access HeatWave through MySQL, using standard connectors. HeatWave supports the same ANSI SQL standard and ACID properties as MySQL, and supports diverse data types. This enables existing applications to take advantage of HeatWave without any changes.
On-premises customers who can’t move their MySQL deployments to a cloud because of compliance or regulatory requirements can still leverage HeatWave by using the hybrid deployment model. In such a hybrid deployment, customers can use MySQL replication to replicate on-premises MySQL data to HeatWave without the need for ETL.
With in-database machine learning in MySQL HeatWave, available at no extra cost, users don’t need to move data to a separate machine learning service such as Amazon SageMaker—accelerating their ML initiatives, increasing security, and reducing costs. They can apply machine learning training, inference, and explanation to data stored both inside MySQL and in the object store with HeatWave Lakehouse. HeatWave AutoML automates the machine learning lifecycle, including algorithm selection, intelligent data sampling for model training, feature selection, and hyperparameter tuning—saving customers significant time and effort.
Developers and data analysts can build machine learning models using familiar SQL commands; they don’t have to learn new tools and languages. Additionally, HeatWave AutoML is integrated with popular notebooks such as Jupyter and Apache Zeppelin. HeatWave AutoML delivers predictions with an explanation of the results, helping organizations with regulatory compliance, fairness, repeatability, causality, and trust.
Currently in private preview, the vector store will enable customers to leverage the power of large language models (LLMs) with their proprietary data to get answers that are more accurate than using models trained on only public data. With generative AI and vector store capabilities, customers can interact with MySQL HeatWave in natural language and efficiently search documents in various file formats in HeatWave Lakehouse.
The vector store ingests documents in a variety of formats, including PDF, and stores them as embeddings generated via an encoder model. For a given user query, the vector store identifies the most similar documents by performing a similarity search against the stored embeddings and the embedded query. These documents are used to augment the prompt given to the LLM so that it provides a more contextual answer.
Real-time elasticity enables customers to increase or decrease the size of their HeatWave cluster by any number of nodes without incurring any downtime or read-only time. The resizing operation takes only a few minutes, during which time HeatWave remains online, available for all operations. Once resized, data is downloaded from object storage, automatically repartitioned among all available cluster nodes, and becomes immediately available for queries. As a result, customers benefit from consistently high performance, even at peak times, and reduce costs by downsizing their HeatWave cluster when appropriate—without incurring any downtime or read-only time. Customers aren’t constrained to overprovisioned instances forced by rigid sizing models offered by other cloud database providers. With efficient data reloading from object storage, customers can also pause and resume their HeatWave cluster to reduce costs.
MySQL HeatWave Lakehouse enables users to query half a petabyte of data in object storage—in a variety of file formats, such as CSV, Parquet, Avro, and export files from other databases. The query processing is done entirely in the HeatWave engine, enabling customers to take advantage of HeatWave for non-MySQL workloads in addition to MySQL-compatible workloads. Customers can query data in various formats in object storage, transactional data in MySQL databases, or a combination of both using standard SQL commands. Querying the data in object storage is as fast as querying the databases. With HeatWave AutoML, customers can use data in object storage, the database, or both to automatically build, train, deploy, and explain ML models—without moving the data to a separate ML cloud service. The HeatWave cluster scales to 512 nodes to process half a petabyte of data, and the data isn’t copied to the MySQL database.
As demonstrated by a 500 TB TPC-H benchmark, the query performance of MySQL HeatWave Lakehouse is 9X faster than Amazon Redshift, 17X faster than Snowflake, 17X faster than Databricks, and 36X faster than Google BigQuery. The load performance of MySQL HeatWave Lakehouse is 2X faster than Snowflake, 6X faster than Databricks, 8X faster than Google BigQuery, and 9X faster than Amazon Redshift.
This fintech startup from Saudi Arabia moved its database workloads to MySQL HeatWave for 3X greater performance and 60% lower costs than another cloud provider. Tamara has grown its client base to more than 2 million users and onboarded 3,000 merchants.
This global high-tech solution provider in the telecom industry speeds up complex queries by 139X with MySQL HeatWave on AWS—simplifying its infrastructure for OLTP and OLAP while delivering subsecond response times to customers.
Japan’s leading advertising network delivers real-time insights and significantly reduces costs with MySQL HeatWave and Autonomous Database.