Query data in object storage in various file formats such as CSV, Parquet, Avro, export files from other databases, transactional data in MySQL databases, or a combination using standard SQL syntax. Data isn’t copied to the MySQL database. Instead, the query processing is done entirely in the HeatWave engine, so you can use HeatWave for non-MySQL workloads and MySQL-compatible workloads alike. When loaded into the HeatWave cluster, data from any source is automatically transformed into a single optimized internal format. As a result, querying the data in object storage is as fast as querying the databases—an industry first.
Oracle MySQL HeatWave’s unrivaled performance is a result of its scale-out architecture, which enables massive parallelism to provision the cluster, load data, and process queries with up to 512 nodes. Each HeatWave node within a cluster and each core within a node can process partitioned data in parallel, including parallel scans, joins, group-by, aggregation, and top-k processing. The algorithms are designed to overlap compute time with the communication of data across nodes, which helps achieve high scalability.
MySQL Autopilot provides workload-aware automation for MySQL HeatWave powered by machine learning (ML). MySQL Autopilot capabilities, such as auto provisioning, auto query plan improvement (which learns various runtime statistics from past query executions to improve the execution plan for future queries), and auto parallel loading, have been enhanced for MySQL HeatWave Lakehouse. Additional capabilities for HeatWave Lakehouse include the following:
With HeatWave AutoML, you can use data in object storage, the database, or both to build, train, deploy, and explain ML models. You don’t need to move the data to a separate ML cloud service, or be an ML expert. HeatWave AutoML automates the machine learning pipeline, including algorithm selection, intelligent data sampling for model training, feature selection, and hyperparameter optimization—saving data analysts significant time and effort. HeatWave AutoML supports anomaly detection, forecasting, classification, regression, and recommender system tasks, even on text columns. You can use HeatWave AutoML at no additional cost.
Using large language models (LLMs), applications can interact with HeatWave Lakehouse in natural language. Currently in private preview, the vector store will enable you to leverage the power of LLMs in combination with your proprietary data to get more relevant and accurate answers than those derived from models trained on only public data. With generative AI and vector store capabilities, you’ll interact with MySQL HeatWave using natural language and efficiently search proprietary documents in various file formats in HeatWave Lakehouse.
Tasks such as high-availability management, patching, upgrades, and backups are automated with a fully managed database service. Data loaded into the HeatWave cluster is automatically recovered in case of an unexpected compute node failure, without retransformation from external data formats.
With access control mechanisms such as Oracle Cloud Infrastructure (OCI) resource principal authentication or pre-authenticated requests, you can have full control over access to data lake sources. When running HeatWave Lakehouse in AWS, you can define identity and access management roles and policies to grant access only to specific S3 data.