We’re sorry. We could not find a match for your search.

We suggest you try the following to help find what you're looking for:

  • Check the spelling of your keyword search.
  • Use synonyms for the keyword you typed, for example, try “application” instead of “software.”
  • Start a new search.
Contact Sales Sign in to Oracle Cloud

MySQL HeatWave Performance

Performance comparison of MySQL HeatWave on OCI with Snowflake, Amazon Redshift, Amazon Aurora, and Amazon RDS for MySQL

Several performance comparisons have been run and the results are presented below. They focus on two different aspects.

  • Query processing
    The performance comparison encompasses a variety of benchmarks—TPC-H, TPC-DS, and CH-benCHmark with different dataset sizes (4 TB, 10 TB, and 30 TB) to validate the speedup provided by HeatWave.
  • Machine learning
    The performance experiments use a wide variety of publicly known datasets for machine learning classification and regression problems.

1. Common setup for analytic (OLAP) workload test for TPC-H and TPC-DS

  • The workload is derived from the TPC's TPC-H and TPC-DS benchmarks.*
  • Generate data using the corresponding data generation tool.
  • Provision and configure the target service.
  • Create the corresponding schema on the target service instance.
  • Import the data generated to the target service instance.
  • Run queries derived from TPC-H or TPC-DS to test the performance.
  • For best performance numbers, always do multiple runs of the query and ignore the first (cold) run.
  • You can always do an explain plan to make sure that you get the best expected plan.

2. Common setup for mixed workload test

  • The workload is derived from CH-benCHmark.**
  • The OLTPBench framework (with changes made to support HeatWave) was used to run the mixed workload benchmark.
  • Provision and configure the target service instance.
  • Create the mixed workload schema (TPC-C and CH-benCH) on the target service instance.
  • Generate and load the 1000 W (100 GB) dataset to the target service instance.
  • Set incoming rate of TPC-C transactions at 500 per second (30K transactions per minute) and incoming rate of CH-benCH transactions at 4 per second (240 transactions per minute).
  • Run 128 concurrent sessions of TPC-C (OLTP) and 4 concurrent sessions of CH-benCH (OLAP).
  • Note the OLTP and OLAP throughput and average latency as reported.

3. HeatWave specific setup

  • Optimal encodings are used for certain columns that will be loaded into HeatWave.
  • Custom data placement is used for certain tables that will be loaded into HeatWave.
  • Mark the tables as offloadable and load them into HeatWave.
  • For each query, force offload to HeatWave using the hint (set_var(use_secondary_engine=forced)).
  • A straight_join hint is required for certain queries to get the optimal query plan for HeatWave.
  • For detailed setup, reference the following:

4. Snowflake on AWS specific setup

  • For TPC-H, use clustering on ORDERS(o_orderdate) and LINEITEM(l_shipdate) tables.
  • You can also try without clustering and pick the scheme that gives better performance.

5. Amazon Redshift specific setup

6. Amazon Aurora specific setup

  • Use the largest shape possible so that as much of the data can fit into the buffer cache as possible.
  • For the 4 TB TPC-H datasets
    • Use the db.r5.24xlarge shapes.
    • Set the innodb_buffer_pool size to 630 GB.
    • Here are other settings that were modified from their default value in our TPC-H experiments: innodb_sort_buffer_size=67108864; lock_wait_timeout =86400; max_heap_table_size=103079215104; tmp_table_size=103079215104.
  • For the 1,000 W TPC-C dataset (used in mixed workload tests)
    • Use the db.r5.8xlarge shape.
    • Use the default parameter settings for innodb_buffer_pool and other parameters.
    • Disable result cache.
    • Enable parallel query.
  • Set aurora_disable_hash_join = 0 and aurora_parallel_query = ON to use parallel query.
  • Follow the best practices for Aurora database configuration for any other tuning.
  • For parallel query to work, make sure that none of the tables are partitioned.
  • A straight_join hint can be used if the query plan looks suboptimal.

7. Amazon Aurora specific setup

  • Use the largest shape possible so that as much of the data can fit into the buffer cache as possible.
  • For the 4 TB TPC-H datasets, use the db.r5.24xlarge shape.
  • Set the innodb_buffer_pool size to 650 GB.
  • Here are other settings that were modified from their default value in our TPC-H experiments: innodb_buffer_pool_instances=64; innodb_page_cleaners=64; innodb_read_io_threads=64; temptable_max_ram=64424509440; max_heap_table_size=103079215104; tmp_table_size=103079215104.
  • Follow the best practices for Amazon RDS for any other tuning.
  • Make sure all the primary and foreign key indexes are created.
  • A straight_join hint can be used if the query plan looks suboptimal.

8. Results

10 TB TPC-DS (March 2022)

MySQL HeatWave Amazon Redshift Snowflake on AWS Google BigQuery Azure Synapse
Instance shape MySQL.HeatWave.
VM.Standard.E3
ra3.4xlarge - - -
Cluster size 15 + 1 MDS 8 X-Large (16) 800 slots DW 2,500c
Geomean time 5.19 seconds 8.2 seconds 13.2 seconds 20.4 seconds 23.2 seconds
Annual cost USD$49,561 USD$150,784 USD$280,320 USD$163,200 USD$165,575

Note: Redshift and Snowflake numbers for 10 TB TPC-DS are provided by a third party.

Note: Redshift pricing is based on one-year reserved instance pricing (paid all upfront). Snowflake pricing is based on standard edition on-demand pricing. Google BigQuery pricing is based on annual flat-rate commitment (per 100 slots), and Azure Synapse pricing is based on the 1 year reserved instance.

4 TB TPC-H (March 2022)

MySQL HeatWave Amazon Redshift Amazon Aurora Amazon RDS for MySQL Snowflake on AWS
Instance shape MySQL.HeatWave.
VM.Standard.E3
ra3.4xlarge db.r5.24xlarge db.r5.24xlarge -
Cluster size 5 + 1 MDS 2 1 1 Medium (4)
Total elapsed time 381 seconds 4,189 seconds 130 hours 338 hours 3,183 seconds
Annual cost USD$18,585 USD$37,696 USD$67,843 USD$54,393 USD$70,080

4 TB TPC-H (March 2022)

MySQL HeatWave Amazon Redshift Snowflake on AWS Azure Synapse Google BigQuery
Instance shape MySQL.HeatWave.
VM.Standard.E3
ra3.4xlarge - DW 1,500c 400 slots
Cluster size 5 + 1 MDS 2 Medium (4) - -
Geomean time 11.6 seconds 130 seconds 107 seconds 31 seconds 109 seconds
Annual cost USD$18,585 USD$37,696 USD$70,080 USD$99,345 USD$81,600

Note: 4 TB TPC-H numbers for Amazon Redshift, Snowflake, Azure Synapse, and Google BigQuery are derived from independent benchmark testing in March 2022.

Note: Redshift pricing is based on one-year reserved instance pricing (paid all upfront). Snowflake pricing is based on standard edition on-demand pricing. Google BigQuery pricing is based on annual flat-rate commitment (per 100 slots), and Azure Synapse pricing is based on the 1 year reserved instance.

4 TB TPC-H (August 2021)

MySQL HeatWave Amazon Redshift Amazon Aurora Amazon RDS for MySQL
Instance shape MySQL.HeatWave.
VM.Standard.E3
dc2.8xlarge db.r5.24xlarge db.r5.24xlarge
Cluster size 10 + 1 MDS 4 1 1
Total elapsed time 224 seconds 728 seconds 130 hours 338 hours
Annual cost USD$34,073 USD$110,560 USD$67,843 USD$54,393

10 TB TPC-H (March 2022)

MySQL HeatWave Amazon Redshift Snowflake on AWS
Instance shape MySQL.HeatWave.
VM.Standard.E3
ra3.4xlarge -
Cluster size 12 + 1 MDS 8 X-Large (16)
Total elapsed time 504 seconds 2,380.4 seconds 2,350.5 seconds
Annual cost USD$40,268 USD$150,784 USD$280,320

Note: Redshift and Snowflake numbers for 4TB TPC-H and 10 TB TPC-H are provided by a third party.

Note: Redshift pricing is based on one-year reserved instance pricing (paid all upfront). Snowflake pricing is based on standard edition on-demand pricing.

10 TB TPC-H (August 2021)

MySQL HeatWave Amazon Redshift Snowflake on AWS
Instance shape MySQL.HeatWave.
VM.Standard.E3
ra3.4xlarge -
Cluster size 25 + 1 MDS 8 X-Large (16)
Total elapsed time 346.5 seconds 2,380.4 seconds 2,350.5 seconds
Annual cost USD$80,536 USD$150,784 USD$420,480

Note: Redshift and Snowflake numbers for 10 TB TPC-H are provided by a third party. Snowflake pricing is based on the enterprise edition.

30 TB TPC-H (March 2022)

MySQL HeatWave Amazon Redshift Snowflake on AWS Azure Synapse Google BigQuery
Instance shape MySQL.HeatWave.
VM.Standard.E3
ra3.16xLarge - DW 15,000c 5,000 slots
Cluster size 42 + 1 MDS (MySQL.HeatWave.
BM.Standard.E3)
20 3X-Large (64) - -
Geomean time 20.59 seconds 32.32 seconds 78.17 seconds 35.44 seconds 108.5 seconds
Annual cost USD$149,495 USD$1,507,840 USD$1,681,920 USD$1,249,956 USD$1,020,000

Note: 30 TB TPC-H numbers for Amazon Redshift, Snowflake, Azure Synapse, and Google BigQuery are derived from independent benchmark testing in October 2020.

* Disclaimer: Benchmark queries are derived from the TPC-H and TPC-DS benchmarks, but results are not comparable to published TPC-H and TPC-DS benchmark results since these do not comply with the TPC-H and TPC-DS specifications.

30 TB TPC-H (August 2021)

MySQL HeatWave Amazon Redshift Snowflake on AWS Azure Synapse Google BigQuery
Instance shape MySQL.HeatWave.
VM.Standard.E3
ra3.16xLarge - DW 15,000c 5,000 slots
Cluster size 75 + 1 MDS (MySQL.HeatWave.
BM.Standard.E3)
20 3X-Large (64) - -
Geomean time 11.4 seconds 32.32 seconds 78.17 seconds 35.44 seconds 108.5 seconds
Annual cost USD$251,714 USD$1,507,840 USD$1,681,920 USD$1,249,956 USD$1,020,000

Note: 30 TB TPC-H numbers for Amazon Redshift, Snowflake, Azure Synapse, and Google BigQuery are derived from independent benchmark testing in October 2020.

* Disclaimer: Benchmark queries are derived from the TPC-H benchmark, but results are not comparable to published TPC-H benchmark results since these do not comply with the TPC-H specification.

1,000 W (100 GB) CH-benCHmark (August 2021)

MySQL HeatWave Amazon Aurora
Instance shape MySQL.HeatWave.VM.Standard.E3 db.r5.8xlarge
Cluster size 2 + 1 MDS 1
OLTP throughput (transactions per minute) 30,000 30,000
OLTP latency 0.02 seconds 0.02 seconds
OLAP throughput (transactions per minute) 6.6 0.06
OLAP latency 35 seconds 637 seconds
Annual cost USD$9,293 USD$22,614

** Disclaimer: CH-benCHmark queries were derived from the TPC-C and CH-benCH queries specified in the OLTPBench framework and are not comparable to any published TPC-C or TPC-H benchmark results since these do not comply with the TPC specifications.

Notes: All costs include only the cost of compute. Storage costs are not included and are extra.

Performance comparison of HeatWave ML with Redshift ML

Setup for the comparison of two different ML problems: classification and regression. For detailed setup, reference the HeatWave ML code for performance benchmarks on GitHub.

1. Common setup

  • Datasets listed below are publicly available. Select a dataset to start your test.

    Datasets for classification

    Dataset Explanation Rows (Training set) Features
    Airlines Predict flight delays. 377,568 8
    Bank Marketing Direct marketing—banking products. 31,648 17
    CNAE-9 Documents with free text business descriptions of Brazilian companies. 757 857
    Connect-4 8-ply positions in the game of connect-4, in which neither player has won yet—predict win/loss. 47,290 43
    Fashion MNIST Clothing classification problem. 60,000 785
    Nomao Active learning is used to efficiently detect data that refers to the same place, based on the Nomao browser. 24,126 119
    Numerai Data is cleaned, regularized, and encrypted global equity data. 67,425 22
    Higgs Monte Carlo simulations. 10,500,000 29
    Census Determine if a person’s income exceeds $50,000 a year. 32,561 15
    Titanic Survival status of individuals. 917 14
    Credit Card Fraud Identify fraudulent transactions. 199,364 30
    KDD Cup (appetency) Predict the propensity of customers to buy new products. 35,000 230

    Datasets for regression

    Dataset Explanation Rows (Training set) Features
    Black Friday Customer purchases on Black Friday. 116,774 10
    Diamonds Predict the price of a diamond. 37,758 17
    Mercedes Time the car took to pass testing. 2,946 377
    News Popularity Predict the number of times articles were shared on social networks. 27,750 60
    NYC Taxi Predict the tip amount for a New York City taxicab. 407,284 15
    Twitter The popularity of a topic on social media. 408,275 78
  • For each dataset, use 70% of the data for training and 30% of the data for testing. We recommend randomly selecting training data from the data set.
  • Compute balanced accuracy for classification and R2 regression for regression on the test set.

2. HeatWave ML specific setup

  • Use a HeatWave cluster with two HeatWave nodes.
  • Run ML_TRAIN with the default arguments, and set task to classification or regression based on the dataset type.
  • Score the generated model using ML_SCORE, with the metric set to balanced_accuracy for classification and R2 for regression.

3. Amazon Redshift ML specific setup

  • Use a Redshift cluster with two nodes on ra3.4xlarge shape.
  • Run CREATE_MODEL with the default values of MAX_CELLS (1M) and MAX_RUNTIME (5400s).
    • For classification dataset, do not specify problem_type and let SageMaker detect the problem type.
    • For regression dataset, run CREATE_MODEL with PROBLEM_TYPE being REGRESSION.
  • Create custom SQL scoring functions to calculate balanced accuracy for classification and R2 for regression.

4. Setup for scalability test

  • Test with 2, 4, 8, and 16 HeatWave node cluster.
  • Run ML_TRAIN on all datasets described in common setup section.
  • Compute HeatWave ML versus Redshift ML speedup for training on every dataset via the following:
    • (Redshift ML default training time budget (90 minutes)) / (HeatWave ML end-to-end ML_TRAIN time)
  • Compute geometric mean of speedup across the datasets for each of the cluster configurations.
  • Once all five geometric mean speedup values are obtained, plot geomean speedup (y-axis) versus cluster size (x-axis).

5. Results

Classification—Performance Comparison

Dataset Accuracy Training time (minutes) Speedup
  Redshift ML HeatWave ML Redshift ML HeatWave ML  
Airlines 0.5 0.6524 90.00 2.71 33.21
Bank 0.8378 0.7115 90.00 3.72 24.19
CNAE-9 X 0.9167 X 5.91 X
Connect-4 0.6752 0.6970 90.00 7.13 12.62
Fashion MNIST X 0.9073 X 181.85 X
Nomao 0.9512 0.9602 90.00 3.30 27.27
Numerai 0.5 0.5184 90.00 0.34 264.71
Higgs 0.5 0.758 90.00 68.58 1.31
Census 0.7985 0.7946 90.00 1.22 73.77
Titanic 0.9571 0.7660 90.00 0.47 191.49
CC Fraud 0.9154 0.9256 90.00 29.06 3.10
KDD Cup X 0.5 X 3.55 X
Geomean 0.712 0.754 90.00 3.561 25.271

Classification—Cost Comparison

Dataset Training cost ($) Speedup
  Redshift ML list Redshift ML
with one-year plan
HeatWave ML  
Airlines 20.00 6.23 0.0479 130.03
Bank 10.76 5.68 0.0658 86.30
CNAE-9 12.97 X 0.10458 X
Connect-4 20.00 6.18 0.1261 49.05
Fashion MNIST 20.00 X 3.2151 X
Nomao 20.00 5.96 0.0583 102.14
Numerai 20.00 5.49 0.0060 913.49
Higgs 20.00 7.27 1.2125 5.99
Census 9.77 6.12 0.0216 283.95
Titanic 0.26 5.60 0.0083 674.32
CC Fraud 20.00 6.70 0.0083 13.03
KDD Cup 20.00 X 0.5138 X
Geomean 10.62 6.115 0.063 97.13

Regression—Performance Comparison

Dataset Accuracy Training time (minutes) Speedup
  Redshift ML HeatWave ML Redshift ML HeatWave ML  
Black Friday 0.54 0.53 90.00 1.14 78.80
Diamonds 0.98 0.98 90.00 2.40 37.42
Mercedes X 0.61 X 1.16 X
News Popularity 0.02 0.01 90.00 0.60 149.13
NYC Taxi 0.19 0.25 90.00 7.34 12.26
Twitter 0.88 0.93 90.00 44.24 2.03
Geomean 0.27 0.26 90.00 3.52 25.58

Regression—Cost Comparison

Dataset Training cost ($) Lower cost
  Redshift ML
list
Redshift ML cost
with one-year plan
HeatWave
ML
 
Black Friday 20.00 2.95 0.02 146.10
Diamonds 7.55 5.13 0.04 120.61
Mercedes 20.00 X 0.02 X
News Popularity 20.00 4.15 0.01 389.08
NYC Taxi 20.00 2.82 0.13 21.76
Twitter 20.00 3.64 0.78 4.66
Geomean 17.00 3.64 0.06 58.66

Scalability

MySQL HeatWave Scalability

Performance comparison of MySQL HeatWave on AWS with Amazon Aurora, Amazon Redshift, Snowflake, Azure Synapse Analytics, and Google BigQuery

Several performance comparisons have been run and the results are presented below. They focus on two different aspects.

  • OLAP performance
    The performance comparison includes the TPC-H benchmark* with a dataset of 4 TB, to validate the speedup provided by MySQL HeatWave on AWS.
  • OLTP performance
    The performance comparison includes the TPC-C benchmark** with a dataset of 10 GB to validate the throughput provided by MySQL HeatWave on AWS.

1. Common setup for analytic (OLAP) workload test for TPC-H

  • The workload is derived from TPC's TPC-H benchmarks.*
  • Generate data using the corresponding data generation tool.
  • Provision and configure the target service.
  • Create the corresponding schema on the target service instance.
  • Import the data generated to the target service instance.
  • Run queries derived from TPC-H to test the performance.
  • For best performance numbers, always do multiple runs of the query and ignore the first (cold) run.
  • You can always do an explain plan to make sure you get the best expected plan.

2. Common setup for transactional (OLTP) workload test for TPC-C

  • The workload is derived from the TPC-C benchmark.**
  • The sysbench framework (with additional Lua scripts for TPC-C) is used to run the TPC-C benchmark.
  • Provision and configure the target service instance.
  • Create the OLTP workload schema (TPC-C) on the target service instance.
  • Generate and load the 100 W (10 GB) dataset to the target service instance.
  • Start with one concurrent session and gradually increase the number of concurrent sessions to see the impact on throughput and latency.
  • Note the OLTP throughput as reported.

3. HeatWave specific setup

  • MySQL.32.256GB shape is used for the MySQL node and HeatWave.256GB shape is used for the HeatWave nodes.
  • For TPC-H benchmark
    • Optimal encodings are used for certain columns that will be loaded into HeatWave.
    • Custom data placement is used for certain tables that will be loaded into HeatWave.
    • Mark the tables as offloadable and load them into HeatWave.
    • For each query, force offload to HeatWave using the hint (set_var(use_secondary_engine=forced)).
    • A straight_join hint is required for certain queries to get the optimal query plan for HeatWave.
  • For TPC-C benchmark
    • Make sure the client and HeatWave instance are in the same physical AZ.
  • For detailed setup, reference the following:

4. Snowflake on AWS specific setup

  • In the experiments, a medium-size cluster is used.
  • For TPC-H, use clustering on ORDERS(o_orderdate) and LINEITEM(l_shipdate) tables.
  • You can also try without clustering and pick the scheme that gives better performance.

5. Amazon Redshift specific setup

6. Microsoft Azure Synapse specific setup

  • In the experiments, DW 1500c cluster is used.

7. Google BigQuery specific setup

  • In the experiments, 400 slots are used for running the queries.

8. Amazon Aurora specific setup for TPC-C benchmark

  • Use the db.r5.8xlarge shape.
  • Use the default parameter settings for innodb_buffer_pool and other parameters.
  • Set adaptive hash index = off; replication = off; redo_log = off.
  • Follow the best practices for Aurora database configuration for any other tuning.

9. Results

4 TB TPC-H (September 2022)

MySQL HeatWave on AWS Amazon Redshift Snowflake on AWS Azure Synapse Google BigQuery
Instance shape HeatWave.256GB + MySQL.32.256GB ra3.4xlarge - - -
Cluster size 10 + 1 MySQL node 2 nodes Medium DW 1500c 400 slots
Geomean time 6.53 seconds 130.62 seconds 107.27 seconds 31.8 seconds 109.47 seconds
Price performance US$0,023 US$0,156 US$0,238 US$0,1 US$0,283

Note: Redshift, Snowflake, Synapse and Google BigQuery numbers for 4 TB TPC-H are provided by a third party.

Note: Redshift pricing is based on one-year reserved instance pricing (paid all upfront). Snowflake pricing is based on standard edition on-demand pricing. Google BigQuery pricing is based on annual flat-rate commitment (per 100 slots), and Azure Synapse pricing is based on the one-year reserved instance.

*Disclaimer: Benchmark queries are derived from the TPC-H benchmarks, but results aren’t comparable to published TPC-H benchmark results since these don't comply with the TPC-H specifications.

100W (10 GB) TPC-C (September 2022)

MySQL HeatWave on AWS node type: MySQL.32.256GB.
Amazon Aurora node type: db.r5.8xlarge.

Concurrency 1 4 16 64 128 256 512 1,024 2,048 4,096
Amazon Aurora throughput 116 471 1,411 3,138 4,615 5,081 4,784 2,487 574 245
MySQL HeatWave throughput 86 322 1,040 3,314 5,198 6,192 6,195 5,953 6,080 6,001

Note: Aurora numbers for TPC-C are provided by a third party.

**Disclaimer: Benchmark queries are derived from the TPC-C benchmarks, but results aren’t comparable to published TPC-C benchmark results since these don’t comply with the TPC-C specifications.