Query data in object storage in various file formats, including CSV, Parquet, Avro, and export files from other databases using standard SQL syntax, and optionally combine it with transactional data in MySQL databases. The query processing is done entirely in the HeatWave engine, so you can use HeatWave for non-MySQL workloads and MySQL-compatible workloads alike. When loaded into the HeatWave cluster, data from any source is automatically transformed into a single optimized internal format. As a result, querying the data in object storage is as fast as querying the databases—an industry first.
You can use HeatWave to query semistructured data in JSON format in object storage, for example, to develop content management apps or real-time dashboards using JSON data in object storage. With native JavaScript support in HeatWave Lakehouse, you can use JavaScript to process and query data in object storage. For example, you can build dynamic content-loading applications using the rich features of JavaScript.
With HeatWave Vector Store, you can upload and query unstructured documents.
HeatWave’s unrivaled performance is a result of its scale-out architecture, which enables massive parallelism to provision the cluster, load data, and process queries with up to 512 nodes. Each HeatWave node within a cluster and each core within a node can process partitioned data in parallel, including parallel scans, joins, group-by, aggregation, and top-k processing. The algorithms are designed to overlap compute time with the communication of data across nodes, which helps achieve high scalability.
HeatWave Autopilot provides workload-aware automation for HeatWave powered by machine learning (ML). HeatWave Autopilot capabilities, such as auto provisioning, auto query plan improvement (which learns various runtime statistics from past query executions to improve the execution plan for future queries), and auto parallel loading, have been enhanced for HeatWave Lakehouse. Additional capabilities for HeatWave Lakehouse include the following:
With HeatWave AutoML, you can use data in object storage, the database, or both to build, train, deploy, and explain ML models. You don’t need to move the data to a separate ML cloud service or be an ML expert. HeatWave AutoML automates the machine learning pipeline, including algorithm selection, intelligent data sampling for model training, feature selection, and hyperparameter optimization—saving data analysts significant time and effort. HeatWave AutoML supports anomaly detection, forecasting, classification, regression, and recommender system tasks, even on text columns. You can use HeatWave AutoML at no additional cost.
Tasks such as high-availability management, patching, upgrades, and backups are automated with a fully managed service. Data loaded into the HeatWave cluster is automatically recovered in case of an unexpected compute node failure, without retransformation from external data formats.
With access control mechanisms, such as Oracle Cloud Infrastructure (OCI) resource principal authentication or pre-authenticated requests, you can have full control over access to data lake sources. When running HeatWave Lakehouse in AWS, you can define identity and access management roles and policies to grant access only to specific S3 data.