We’re sorry. We could not find a match for your search.

We suggest you try the following to help find what you're looking for:

  • Check the spelling of your keyword search.
  • Use synonyms for the keyword you typed, for example, try “application” instead of “software.”
  • Start a new search.
Contact Sales Sign in to Oracle Cloud

Performance comparison of MySQL HeatWave with Snowflake, Amazon Redshift, Amazon Aurora, and Amazon RDS for MySQL

Several performance comparisons have been run and the results are presented below. They focus on two different aspects.

  • Query processing
    The performance comparison encompasses a variety of benchmarks—TPC-H, TPC-DS, and CH-benCHmark with different dataset sizes (4 TB, 10 TB, and 30 TB) to validate the speedup provided by HeatWave.
  • Machine learning
    The performance experiments use a wide variety of publicly known datasets for machine learning classification and regression problems.

1. Common setup for analytic workload test for TPC-H and TPC-DS

  • The workload is derived from the TPC's TPC-H and TPC-DS benchmarks.*
  • Generate data using the corresponding data generation tool.
  • Provision and configure the target service.
  • Create the corresponding schema on the target service instance.
  • Import the data generated to the target service instance.
  • Run queries derived from TPC-H or TPC-DS to test the performance.
  • For best performance numbers, always do multiple runs of the query and ignore the first (cold) run.
  • You can always do an explain plan to make sure that you get the best expected plan.

2. Common setup for mixed workload test

  • The workload is derived from CH-benCHmark.**
  • The OLTPBench framework (with changes made to support HeatWave) was used to run the mixed workload benchmark.
  • Provision and configure the target service instance.
  • Create the mixed workload schema (TPC-C and CH-benCH) on the target service instance.
  • Generate and load the 1000 W (100 GB) dataset to the target service instance.
  • Set incoming rate of TPC-C transactions at 500 per second (30K transactions per minute) and incoming rate of CH-benCH transactions at 4 per second (240 transactions per minute).
  • Run 128 concurrent sessions of TPC-C (OLTP) and 4 concurrent sessions of CH-benCH (OLAP).
  • Note the OLTP and OLAP throughput and average latency as reported.

3. HeatWave specific setup

  • Optimal encodings are used for certain columns that will be loaded into HeatWave.
  • Custom data placement is used for certain tables that will be loaded into HeatWave.
  • Mark the tables as offloadable and load them into HeatWave.
  • For each query, force offload to HeatWave using the hint (set_var(use_secondary_engine=forced)).
  • A straight_join hint is required for certain queries to get the optimal query plan for HeatWave.
  • For detailed setup, reference the following:

4. Snowflake on AWS specific setup

  • For TPC-H, use clustering on ORDERS(o_orderdate) and LINEITEM(l_shipdate) tables.
  • You can also try without clustering and pick the scheme that gives better performance.

5. Amazon Redshift specific setup

6. Amazon Aurora specific setup

  • Use the largest shape possible so that as much of the data can fit into the buffer cache as possible.
  • For the 4 TB TPC-H datasets
    • Use the db.r5.24xlarge shapes.
    • Set the innodb_buffer_pool size to 630 GB.
    • Here are other settings that were modified from their default value in our TPC-H experiments: innodb_sort_buffer_size=67108864; lock_wait_timeout =86400; max_heap_table_size=103079215104; tmp_table_size=103079215104.
  • For the 1,000 W TPC-C dataset (used in mixed workload tests)
    • Use the db.r5.8xlarge shape.
    • Use the default parameter settings for innodb_buffer_pool and other parameters.
    • Disable result cache.
    • Enable parallel query.
  • Set aurora_disable_hash_join = 0 and aurora_parallel_query = ON to use parallel query.
  • Follow the best practices for Aurora database configuration for any other tuning.
  • For parallel query to work, make sure that none of the tables are partitioned.
  • A straight_join hint can be used if the query plan looks suboptimal.

7. Amazon RDS for MySQL specific setup

  • Use the largest shape possible so that as much of the data can fit into the buffer cache as possible.
  • For the 4 TB TPC-H datasets, use the db.r5.24xlarge shape.
  • Set the innodb_buffer_pool size to 650 GB.
  • Here are other settings that were modified from their default value in our TPC-H experiments: innodb_buffer_pool_instances=64; innodb_page_cleaners=64; innodb_read_io_threads=64; temptable_max_ram=64424509440; max_heap_table_size=103079215104; tmp_table_size=103079215104.
  • Follow the best practices for Amazon RDS for any other tuning.
  • Make sure all the primary and foreign key indexes are created.
  • A straight_join hint can be used if the query plan looks suboptimal.

8. Results


10 TB TPC-DS (March 2022)

MySQL HeatWave Amazon Redshift Snowflake on AWS Google BigQuery Azure Synapse
Instance shape MySQL.HeatWave.
VM.Standard.E3
ra3.4xlarge - - -
Cluster size 15 + 1 MDS 8 X-Large (16) 800 slots DW 2,500c
Geomean time 5.19 seconds 8.2 seconds 13.2 seconds 20.4 seconds 23.2 seconds
Annual cost USD$49,561 USD$150,784 USD$280,320 USD$163,200 USD$165,575

Note: Redshift and Snowflake numbers for 10 TB TPC-DS are provided by a third party.

Note: Redshift pricing is based on one-year reserved instance pricing (paid all upfront). Snowflake pricing is based on standard edition on-demand pricing. Google BigQuery pricing is based on annual flat-rate commitment (per 100 slots), and Azure Synapse pricing is based on the 1 year reserved instance.


4 TB TPC-H (March 2022)

MySQL HeatWave Amazon Redshift Amazon Aurora Amazon RDS for MySQL Snowflake on AWS
Instance shape MySQL.HeatWave.
VM.Standard.E3
ra3.4xlarge db.r5.24xlarge db.r5.24xlarge -
Cluster size 5 + 1 MDS 2 1 1 Medium (4)
Total elapsed time 381 seconds 4,189 seconds 130 hours 338 hours 3,183 seconds
Annual cost USD$18,585 USD$37,696 USD$67,843 USD$54,393 USD$70,080

4 TB TPC-H (August 2021)

MySQL HeatWave Amazon Redshift Amazon Aurora Amazon RDS for MySQL
Instance shape MySQL.HeatWave.
VM.Standard.E3
dc2.8xlarge db.r5.24xlarge db.r5.24xlarge
Cluster size 10 + 1 MDS 4 1 1
Total elapsed time 224 seconds 728 seconds 130 hours 338 hours
Annual cost USD$34,073 USD$110,560 USD$67,843 USD$54,393

10 TB TPC-H (March 2022)

MySQL HeatWave Amazon Redshift Snowflake on AWS
Instance shape MySQL.HeatWave.
VM.Standard.E3
ra3.4xlarge -
Cluster size 12 + 1 MDS 8 X-Large (16)
Total elapsed time 504 seconds 2,380.4 seconds 2,350.5 seconds
Annual cost USD$40,268 USD$150,784 USD$280,320

Note: Redshift and Snowflake numbers for 4TB TPC-H and 10 TB TPC-H are provided by a third party.

Note: Redshift pricing is based on one-year reserved instance pricing (paid all upfront). Snowflake pricing is based on standard edition on-demand pricing.


10 TB TPC-H (August 2021)

MySQL HeatWave Amazon Redshift Snowflake on AWS
Instance shape MySQL.HeatWave.
VM.Standard.E3
ra3.4xlarge -
Cluster size 25 + 1 MDS 8 X-Large (16)
Total elapsed time 346.5 seconds 2,380.4 seconds 2,350.5 seconds
Annual cost USD$80,536 USD$150,784 USD$420,480

Note: Redshift and Snowflake numbers for 10 TB TPC-H are provided by a third party. Snowflake pricing is based on the enterprise edition.


30 TB TPC-H (March 2022)

MySQL HeatWave Amazon Redshift Snowflake on AWS Azure Synapse Google BigQuery
Instance shape MySQL.HeatWave.
VM.Standard.E3
ra3.16xLarge - DW 15,000c 5,000 slots
Cluster size 42 + 1 MDS (MySQL.HeatWave.
BM.Standard.E3)
20 3X-Large (64) - -
Geomean time 20.59 seconds 32.32 seconds 78.17 seconds 35.44 seconds 108.5 seconds
Annual cost USD$149,495 USD$1,507,840 USD$1,681,920 USD$1,249,956 USD$1,020,000

Note: 30 TB TPC-H numbers for Amazon Redshift, Snowflake, Azure Synapse, and Google BigQuery are derived from independent benchmark testing in October 2020.

* Disclaimer: Benchmark queries are derived from the TPC-H and TPC-DS benchmarks, but results are not comparable to published TPC-H and TPC-DS benchmark results since these do not comply with the TPC-H and TPC-DS specifications.


30 TB TPC-H (August 2021)

MySQL HeatWave Amazon Redshift Snowflake on AWS Azure Synapse Google BigQuery
Instance shape MySQL.HeatWave.
VM.Standard.E3
ra3.16xLarge - DW 15,000c 5,000 slots
Cluster size 75 + 1 MDS (MySQL.HeatWave.
BM.Standard.E3)
20 3X-Large (64) - -
Geomean time 11.4 seconds 32.32 seconds 78.17 seconds 35.44 seconds 108.5 seconds
Annual cost USD$251,714 USD$1,507,840 USD$1,681,920 USD$1,249,956 USD$1,020,000

Note: 30 TB TPC-H numbers for Amazon Redshift, Snowflake, Azure Synapse, and Google BigQuery are derived from independent benchmark testing in October 2020.

* Disclaimer: Benchmark queries are derived from the TPC-H benchmark, but results are not comparable to published TPC-H benchmark results since these do not comply with the TPC-H specification.


1,000 W (100 GB) CH-benCHmark (August 2021)

MySQL HeatWave Amazon Aurora
Instance shape MySQL.HeatWave.VM.Standard.E3 db.r5.8xlarge
Cluster size 2 + 1 MDS 1
OLTP throughput (transactions per minute) 30,000 30,000
OLTP latency 0.02 seconds 0.02 seconds
OLAP throughput (transactions per minute) 6.6 0.06
OLAP latency 35 seconds 637 seconds
Annual cost USD$9,293 USD$22,614

** Disclaimer: CH-benCHmark queries were derived from the TPC-C and CH-benCH queries specified in the OLTPBench framework and are not comparable to any published TPC-C or TPC-H benchmark results since these do not comply with the TPC specifications.

Notes: All costs include only the cost of compute. Storage costs are not included and are extra.

Performance comparison of HeatWave ML with Redshift ML

Setup for the comparison of two different ML problems: classification and regression. For detailed setup, reference the HeatWave ML code for performance benchmarks on GitHub.

1. Common setup

  • Datasets listed below are publicly available. Select a dataset to start your test.
    • Datasets for classification
    • Dataset Explanation Rows (Training set) Features
      Airlines Predict flight delays. 377,568 8
      Bank Marketing Direct marketing—banking products. 31,648 17
      CNAE-9 Documents with free text business descriptions of Brazilian companies. 757 857
      Connect-4 8-ply positions in the game of connect-4, in which neither player has won yet—predict win/loss. 47,290 43
      Fashion MNIST Clothing classification problem. 60,000 785
      Nomao Active learning is used to efficiently detect data that refers to the same place, based on the Nomao browser. 24,126 119
      Numerai Data is cleaned, regularized, and encrypted global equity data. 67,425 22
      Higgs Monte Carlo simulations. 10,500,000 29
      Census Determine if a person’s income exceeds $50,000 a year. 32,561 15
      Titanic Survival status of individuals. 917 14
      Credit Card Fraud Identify fraudulent transactions. 199,364 30
      KDD Cup (appetency) Predict the propensity of customers to buy new products. 35,000 230
    • Datasets for regression
    • Dataset Explanation Rows (Training set) Features
      Black Friday Customer purchases on Black Friday. 116,774 10
      Diamonds Predict the price of a diamond. 37,758 17
      Mercedes Time the car took to pass testing. 2,946 377
      News Popularity Predict the number of times articles were shared on social networks. 27,750 60
      NYC Taxi Predict the tip amount for a New York City taxicab. 407,284 15
      Twitter The popularity of a topic on social media. 408,275 78

    • For each dataset, use 70% of the data for training and 30% of the data for testing. We recommend randomly selecting training data from the data set.
    • Compute balanced accuracy for classification and R2 regression for regression on the test set.

2. HeatWave ML specific setup

  • Use a HeatWave cluster with two HeatWave nodes.
  • Run ML_TRAIN with the default arguments, and set task to classification or regression based on the dataset type.
  • Score the generated model using ML_SCORE, with the metric set to balanced_accuracy for classification and R2 for regression.

3. Amazon Redshift ML specific setup

  • Use a Redshift cluster with two nodes on ra3.4xlarge shape.
  • Run CREATE_MODEL with the default values of MAX_CELLS (1M) and MAX_RUNTIME (5400s).
    • For classification dataset, do not specify problem_type and let SageMaker detect the problem type.
    • For regression dataset, run CREATE_MODEL with PROBLEM_TYPE being REGRESSION.
  • Create custom SQL scoring functions to calculate balanced accuracy for classification and R2 for regression.

4. Setup for scalability test

  • Test with 2, 4, 8, and 16 HeatWave node cluster.
  • Run ML_TRAIN on all datasets described in common setup section.
  • Compute HeatWave ML versus Redshift ML speedup for training on every dataset via the following:
    • (Redshift ML default training time budget (90 minutes)) / (HeatWave ML end-to-end ML_TRAIN time)
  • Compute geometric mean of speedup across the datasets for each of the cluster configurations.
  • Once all five geometric mean speedup values are obtained, plot geomean speedup (y-axis) versus cluster size (x-axis).

7. Results

Classification—Performance Comparison

Dataset
Accuracy
Training time (minutes)
Speedup
  Redshift ML HeatWave ML Redshift ML HeatWave ML  
Airlines 0.5 0.6524 90.00 2.71 33.21
Bank 0.8378 0.7115 90.00 3.72 24.19
CNAE-9 X 0.9167 X 5.91 X
Connect-4 0.6752 0.6970 90.00 7.13 12.62
Fashion MNIST X 0.9073 X 181.85 X
Nomao 0.9512 0.9602 90.00 3.30 27.27
Numerai 0.5 0.5184 90.00 0.34 264.71
Higgs 0.5 0.758 90.00 68.58 1.31
Census 0.7985 0.7946 90.00 1.22 73.77
Titanic 0.9571 0.7660 90.00 0.47 191.49
CC Fraud 0.9154 0.9256 90.00 29.06 3.10
KDD Cup X 0.5 X 3.55 X
Geomean 0.712 0.754 90.00 3.561 25.271

Classification—Cost Comparison

Dataset
Training cost ($)
Speedup
  Redshift ML list Redshift ML
with one-year plan
HeatWave ML  
Airlines 20.00 6.23 0.0479 130.03
Bank 10.76 5.68 0.0658 86.30
CNAE-9 12.97 X 0.10458 X
Connect-4 20.00 6.18 0.1261 49.05
Fashion MNIST 20.00 X 3.2151 X
Nomao 20.00 5.96 0.0583 102.14
Numerai 20.00 5.49 0.0060 913.49
Higgs 20.00 7.27 1.2125 5.99
Census 9.77 6.12 0.0216 283.95
Titanic 0.26 5.60 0.0083 674.32
CC Fraud 20.00 6.70 0.0083 13.03
KDD Cup 20.00 X 0.5138 X
Geomean 10.62 6.115 0.063 97.13

Regression—Performance Comparison

Dataset
Accuracy
Training time (minutes)
Speedup
  Redshift ML HeatWave ML Redshift ML HeatWave ML  
Black Friday 0.54 0.53 90.00 1.14 78.80
Diamonds 0.98 0.98 90.00 2.40 37.42
Mercedes X 0.61 X 1.16 X
News Popularity 0.02 0.01 90.00 0.60 149.13
NYC Taxi 0.19 0.25 90.00 7.34 12.26
Twitter 0.88 0.93 90.00 44.24 2.03
Geomean 0.27 0.26 90.00 3.52 25.58

Regression—Cost Comparison

Dataset
Training cost ($)
Lower cost
  Redshift ML
list
Redshift ML cost
with one-year plan
HeatWave
ML
 
Black Friday 20.00 2.95 0.02 146.10
Diamonds 7.55 5.13 0.04 120.61
Mercedes 20.00 X 0.02 X
News Popularity 20.00 4.15 0.01 389.08
NYC Taxi 20.00 2.82 0.13 21.76
Twitter 20.00 3.64 0.78 4.66
Geomean 17.00 3.64 0.06 58.66

Scalability

MySQL HeatWave Scalability