HeatWave Features

HeatWave

HeatWave is an in-memory, massively parallel, hybrid columnar data-processing engine. It implements state-of-the-art algorithms for distributed query processing that provide very high performance.

Architected for massive scale and performance

HeatWave massively partitions data across a cluster of nodes, which can be operated in parallel. This provides excellent internodal scalability. Each node within a cluster and each core within a node can process partitioned data in parallel. HeatWave has an intelligent query scheduler that overlaps computation with network communication tasks to achieve very high scalability across thousands of cores.

Optimized for the cloud and data in object storage

Query processing in HeatWave has been optimized for commodity servers in the cloud. The sizes of the partitions have been optimized to fit the cache of the underlying shapes. The overlap of computation with communication is optimized for the network bandwidth available. Various analytics processing primitives use the hardware instructions of the underlying virtual machines (VMs). HeatWave is also designed to be a scale-out data processing engine, optimized to query data in object storage.


HeatWave GenAI

HeatWave GenAI provides integrated and automated generative AI with in-database large language models (LLMs); an automated, in-database vector store; and the ability to have contextual conversations in natural language—letting you take advantage of generative AI without AI expertise or data movement.

In-database LLMs

Use the built-in, optimized LLMs in all Oracle Cloud Infrastructure (OCI) regions, OCI Dedicated Region, and across clouds; and obtain consistent results with predictable performance across deployments. Help reduce infrastructure costs by eliminating the need to provision GPUs.

Integrated with OCI Generative AI

Access pretrained, foundational models from Cohere and Meta via the OCI Generative AI service.

In-database vector store

Perform retrieval-augmented generation (RAG) across LLMs and your proprietary documents in various formats housed in HeatWave Vector Store to get more accurate and contextually relevant answers—without moving data to a separate vector database.

Automated generation of embeddings

Leverage the automated pipeline to help discover and ingest proprietary documents in HeatWave Vector Store, making it easier for developers and analysts without AI expertise to use the vector store.

Scale-out vector processing

Vector processing is parallelized across up to 512 HeatWave cluster nodes and executed at memory bandwidth, helping to deliver fast results with a reduced likelihood of accuracy loss.

HeatWave Chat

Have contextual conversations informed by your unstructured documents in object storage using natural language. Use the integrated Lakehouse Navigator to help guide LLMs to search through specific data sets, helping you reduce costs while getting more accurate results faster.

Learn more about HeatWave GenAI


HeatWave MySQL

HeatWave MySQL is a fully managed database service, and the only cloud service built on MySQL Enterprise Edition, with advanced security features for encryption, data masking, authentication, and a database firewall. HeatWave improves MySQL query performance by orders of magnitude and enables you to get real-time analytics on your transactional data in MySQL—without the complexity, latency, risks, and cost of extract, transform, and load (ETL) duplication to a separate analytics database.

Real-time analytics without ETL

Analytics queries access the most current information as updates from transactions automatically replicate in real time to the HeatWave analytics cluster. There’s no need to index the data before running analytics queries. You can eliminate the complex, time-consuming, and costly ETL process and integration with a separate analytics database.

Faster than Amazon and Snowflake at a fraction of the cost

HeatWave MySQL is faster and delivers better price-performance, as demonstrated by multiple standard industry benchmarks, including TPC-H, TPC-DS, and CH-benCHmark.

Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse, Amazon Aurora, and Amazon RDS are slower and more expensive

  • Amazon Redshift: 4X slower; 10X worse price-performance
  • Snowflake: 4X slower; 15X worse price-performance
  • Google BigQuery: 9X slower; 20X worse price-performance
  • Azure Synapse: 4X slower; 10X worse price-performance
  • Amazon Aurora: 1,400X slower; 2,200X worse price-performance
  • Amazon RDS: 3,500X slower; 4,600X worse price-performance

Learn more about HeatWave MySQL


HeatWave Lakehouse

HeatWave Lakehouse lets users query half a petabyte of data in object storage—in a variety of file formats, such as CSV, Parquet, Avro, JSON, and export files from other databases. The query processing is done entirely in the HeatWave engine, enabling customers to take advantage of HeatWave for non-MySQL workloads in addition to MySQL-compatible workloads.

Faster and less expensive than Snowflake, Amazon Redshift, Databricks, and Google BigQuery

As demonstrated by a 500 TB TPC-H benchmark, the query performance of HeatWave Lakehouse is

  • 15X faster than Amazon Redshift, delivering 11X better price-performance
  • 18X faster than Databricks, delivering 15X better price-performance
  • 18X faster than Snowflake, delivering 19X better price-performance
  • 35X faster than Google BigQuery, delivering 22X better price-performance

The data load performance of HeatWave Lakehouse is

  • 2X faster than Snowflake, delivering 3X better price-performance
  • 6X faster than Databricks, delivering 6X better price-performance
  • 8X faster than Google BigQuery, delivering 7X better price-performance
  • 9X faster than Amazon Redshift, delivering 8X better price-performance

Fast lakehouse analytics and machine learning on all data

Customers can query data in various formats in object storage, transactional data in MySQL databases, or a combination of both using standard SQL commands. Querying the data in object storage is as fast as querying the databases, as demonstrated by a 10 TB TPC-H benchmark.

With HeatWave AutoML, customers can use data in object storage, the database, or both to automatically build, train, deploy, and explain ML models—without moving the data to a separate ML cloud service.

Scale-out architecture for data management and query processing

HeatWave’s massively partitioned architecture enables a scale-out architecture for HeatWave Lakehouse. Query processing and data management operations, such as loading/reloading data, scale with the size of data. Customers can query up to half a petabyte of data in object storage with HeatWave Lakehouse without copying it to the MySQL database. The HeatWave cluster scales to 512 nodes.

Increase performance and save time with machine learning–powered automation

HeatWave Autopilot capabilities, such as auto provisioning, auto query plan improvement, and auto parallel loading, have been enhanced for HeatWave Lakehouse, further reducing database administration overhead and improving performance. New HeatWave Autopilot capabilities are also available for HeatWave Lakehouse.

  • Auto schema inference automatically infers the mapping of file data to the corresponding schema definition for all supported file types, including CSV. As a result, customers don’t need to manually define and update the schema mapping of files, saving time and effort.
  • Adaptive data sampling intelligently samples the files in object storage to derive information used by HeatWave Autopilot to make predictions for automation. Using adaptive data sampling, HeatWave Autopilot can scan and make predictions, such as schema mapping on a 400 TB file in less than one minute.
  • Adaptive data flow lets HeatWave Lakehouse dynamically adapt to the performance of the underlying object store in any region to improve overall performance, price-performance, and availability.

Learn more about HeatWave Lakehouse


HeatWave AutoML

HeatWave AutoML includes everything users need to build, train, and explain machine learning models within HeatWave, at no additional cost.

No need for a separate machine learning service

With in-database machine learning in HeatWave, customers don’t need to move data to a separate machine learning service. They can easily and securely apply machine learning training, inference, and explanation to data stored both inside MySQL and in the object store with HeatWave Lakehouse. As a result, they can accelerate ML initiatives, increase security, and reduce costs.

Save time and effort with machine learning lifecycle automation

HeatWave AutoML automates the machine learning lifecycle, including algorithm selection, intelligent data sampling for model training, feature selection, and hyperparameter optimization—saving data analysts and data scientists significant time and effort. Aspects of the machine learning pipeline can be customized, including algorithm selection, feature selection, and hyperparameter optimization. HeatWave AutoML supports anomaly detection, forecasting, classification, regression, and recommender system tasks, including on text columns.

Recommender system for personalized recommendations

By considering both implicit feedback (past purchases, browsing behavior, and so forth) and explicit feedback (ratings, likes, and so forth), the HeatWave AutoML recommender system can generate personalized recommendations. Analysts, for instance, can predict items that a user will like, users who will like a specific item, and ratings that items will receive. They can also, given a user, obtain a list of similar users, and given a specific item, obtain a list of similar items.

Interactive HeatWave AutoML console

The interactive console lets business analysts build, train, run, and explain ML models using a visual interface—without using SQL commands or any coding. The console also makes it easy to explore what-if scenarios to evaluate business assumptions—for example, “How would investing 30% more in paid social media advertising affect both revenue and profit?”

Faster, less expensive, and more accurate than Redshift ML

Benchmarks demonstrate that, on average, HeatWave AutoML produces more accurate results than Amazon Redshift ML, trains models up to 25X faster at 1% of the cost, and scales as more nodes are added.

See the benchmark details

Explainable ML models

All the models trained by HeatWave AutoML are explainable. HeatWave AutoML delivers predictions with an explanation of the results, helping organizations with regulatory compliance, fairness, repeatability, causality, and trust.

Use current skills

Developers and data analysts can build machine learning models using familiar SQL commands; they don’t have to learn new tools and languages. Additionally, HeatWave AutoML is integrated with popular notebooks such as Jupyter and Apache Zeppelin.

Read the HeatWave AutoML technical brief (PDF)


HeatWave Autopilot

HeatWave Autopilot provides workload-aware, machine learning–powered automation. It improves performance and scalability without requiring database tuning expertise, increases the productivity of developers and DBAs, and helps eliminate human errors. HeatWave Autopilot automates many of the most important and often challenging aspects of achieving high query performance at scale—including provisioning, data loading, query execution, and failure handling. HeatWave Autopilot is available at no additional charge for HeatWave MySQL customers.

HeatWave Autopilot provides numerous capabilities for both HeatWave and OLTP, including

  • Auto provisioning predicts the number of HeatWave nodes required for running a workload by adaptive sampling of table data on which analytics is required. This means developers and DBAs no longer need to manually estimate the optimal size of their cluster.
  • Auto thread pooling lets the database service process more transactions for a given hardware configuration, delivering higher throughput for OLTP workloads and preventing it from dropping at high levels of transactions and concurrency.
  • Auto shape prediction continuously monitors the OLTP workload, including throughput and buffer pool hit rate, to recommend the right compute shape at any given time—allowing customers to always get the best price-performance.
  • Auto encoding determines the optimal representation of columns being loaded into HeatWave, taking the queries into consideration. This optimal representation provides the best query performance and minimizes the size of the cluster to minimize costs.
  • Auto query plan improvement learns various statistics from the execution of queries and improves the execution plan of future queries. This improves the performance of the system as more queries are run.
  • Adaptive query optimization uses various statistics to adjust data structures and system resources after query execution has started—independently optimizing query execution for each node based on actual data distribution at runtime. This helps improve the performance of ad hoc queries by up to 25%.
  • Auto data placement predicts the column on which tables should be partitioned in memory to achieve the best performance for queries. It also predicts the expected gain in query performance with the new column recommendation. This minimizes data movement across nodes due to suboptimal choices that can be made by operators when manually selecting the column.
  • Auto compression determines the optimal compression algorithm for each column, which improves load and query performance with faster data compression and decompression. By reducing memory usage, customers can cut costs by up to 25%.
  • Indexing (limited availability) automatically determines the indexes that customers should create or drop from their tables to optimize OLTP throughput, using machine learning to make a prediction based on individual application workloads. That helps customers eliminate the time-consuming tasks of creating optimal indexes for their OLTP workloads and maintaining those over time as workloads evolve.

See all HeatWave Autopilot capabilities (PDF)


Real-time elasticity

Real-time elasticity enables customers to increase or decrease the size of their HeatWave cluster by any number of nodes without incurring any downtime or read-only time.

Consistent high performance, even at peak times, and reduced costs with no downtime

The resizing operation takes only a few minutes, during which time HeatWave remains online, available for all operations. Once resized, data is downloaded from object storage, automatically rebalanced among all available cluster nodes, and becomes immediately available for queries. As a result, customers benefit from consistently high performance, even at peak times, and lower costs by downsizing their HeatWave cluster when appropriate—without incurring any downtime or read-only time.

With efficient data reloading from object storage, customers can also pause and resume their HeatWave cluster to reduce costs.

No overprovisioned instances

Customers can expand or reduce their HeatWave cluster to any number of nodes. They aren’t constrained to overprovisioned and costly instances forced by rigid sizing models offered by other cloud database providers. With HeatWave customers pay only for the exact resources they use.

Read the HeatWave MySQL technical brief (PDF)


Available in public clouds and your data center

You can deploy HeatWave on OCI, AWS, or Azure. You can replicate data from on-premises OLTP applications to HeatWave to get near real-time analytics and process vector data in the cloud. You also can use HeatWave in your data center with OCI Dedicated Region.

HeatWave on AWS delivers a native experience for AWS customers. The console, control plane, and data plane reside in AWS.

Learn more about HeatWave on AWS (PDF)