Oracle Cloud HPC solutions combine the performance of on-premises solutions with the elasticity and consumption-based costs of the cloud, giving customers the option to either migrate away from, or supplement, capital intensive on-premises systems. The Oracle Cloud Infrastructure HPC platform includes bare metal compute instances, low latency cluster networks for RDMA, high-performance storage solutions and filesystems, network traffic isolation, and the tools you need to automate and run jobs seamlessly in the cloud. For everything from crash simulations in the automotive industry to seismic analysis for oil and gas companies to special effects rendering for media companies, Oracle’s cloud-based infrastructure is enabling customers to solve complex technical problems—faster.
Solutions by industry
Oracle built the infrastructure and services in the cloud to support the needs of enterprise-class customers who traditionally relied on on-premises systems to achieve timely results. With Oracle, customers avoid long queuing times and potential delays in design and instead focus on reinventing how they develop products, measure risk, deliver experiences, and revolutionize their industries.
Computational fluid dynamics in manufacturingImage courtesy of Altair
Computational Fluid Dynamics (CFD) is a common workload that simulates the motion of air and fluid to simplify and speed product engineering. For example, in the automotive sector, it helps manufacturers simulate cabin airflow, engine oil dynamics, and the air flow around the car to improve fuel efficiency. It is a tightly coupled MPI-based workload that benefits from Oracle’s 100 Gbps cluster networking, our high-frequency Intel processor-based compute instances, and the latest NVIDIA GPUs. Oracle HPC is 44% less expensive than what AWS offers.
Source: The Open CAE Society of Japan
“We’re excited to collaborate with Oracle to offer our customers CONVERGE on Oracle Cloud Infrastructure. With Oracle Cloud Infrastructure’s bare metal HPC shapes and low latency remote direct memory access (RDMA) networking, we were able to get excellent scaling for CONVERGE.”
—Dr. Kelly Senecal, Owner and Vice President of Convergent Science
The graph below shows CONVERGE 3.0 on Oracle Cloud Infrastructure providing almost ideal and near-linear scaling to 4,000 cores for a combusting turbulent partially-premixed Sandia Flame D simulation with 170 million cells.
When Nissan needed the best place to run their computational fluid dynamics (CFD) workloads, they picked Oracle Cloud Infrastructure (OCI). Nissan relies on digital product design to make quick and critical design decisions to improve their cars’ fuel efficiency, reliability, and safety. Computationally-intensive and latency-sensitive CFD simulations are critical to helping them achieve these efficiencies. By migrating these workloads to Oracle Cloud Infrastructure Nissan obtains on-premises levels of performance with cloud flexibility.
“We selected Oracle Cloud Infrastructure’s HPC solutions as a part of our multi-cloud strategy to meet the challenges of increased simulation demand under constant cost savings pressure. I believe Oracle will bring significant ROI to Nissan.”
—Bing Xu, General Manager, Engineering Systems Department, Nissan Motor Co, Ltd
“In the world of computational fluid dynamics (CFD), there is constant pressure to accelerate the speed of product design and today, our customers are looking turn around high-fidelity simulations in hours, not weeks. Running Simcenter STAR-CCM+ on Oracle Cloud Infrastructure has enabled our customers to scale-up their simulations quickly and easily without expensive hardware investment or compromising solution fidelity. Our customers get the same performance and scaling as they get on-premise at lower cost, enabling them to make better engineering decisions faster.”
—Keith Foston, Cloud Product Manager, Siemens
Digital twin product engineering and testing in manufacturing
Digital twins are commonly used to speed prototype design and testing before physical products are produced. A variety of product lifecycle management (PLM) and engineering simulation software packages are used by manufacturers, all of which require significant CPU-based or GPU-based compute resources.
Altair AcuSolve is able to scale performance with node counts at near-ideal levels on Oracle Cloud Infrastructure using our cluster networks for RDMA.
Deep learning and GPU-accelerated computing
With the explosion of business data ranging from customer data to the Internet of Things (IoT), data scientists need the flexibility to explore and build deep learning models quickly and with more flexibility than traditional on-premises IT hardware can provide. Oracle Cloud provides GPU compute instances for deep learning, easy-to-deploy images, and the flexibility to run a single-GPU workstation or cluster of multi-GPU shapes.
Visual-recognition, deep-learning models benefit from several Oracle Cloud Infrastructure capabilities and innovations. They include announced NVIDIA A100 Tensor Core GPU compute instances with up to 8 GPUs and NVLink featuring the latest 2nd Gen AMD EPYC processors running at 2.9 GHz, with up to 64 physical cores, along with local NVMe storage for low-latency data access for workloads that rely on heavy checkpointing. These GPU instances will be the first on Oracle Cloud Infrastructure to support cluster networking, our 100 Gbps RDMA interconnect that lets customers run MPI workloads with latencies of less than 2 microseconds and a combined 1.6 Tbps of bandwidth.
“Oracle Cloud Infrastructure was the first to come out with a new NVIDIA Tesla Cloud solution. The Tensor cores run about 125 teraflops but use only about 300 watts of power. It allows us to run models and data sets far in advance of anything we had done before and see patterns in data we couldn’t see before which are not obvious to humans. The first model we ran with machine learning was 40% more accurate than the version of the model that was in production at that time. We had expected it to take hours, but it only took minutes.”
—James Kelloway, Energy Intelligence Manager, National Grid ESO
Financial applications, including trading applications, require high-performance, low-latency infrastructure, which provides very consistent, “low jitter” performance. These applications were not a design goal of early cloud architectures, and they have been slow to move to the cloud. Oracle Cloud Infrastructure provides the performance characteristics, such as sub 2 microsecond in-cluster latency these applications require, rivaling custom-built and expensive on-premises solutions and delivering the results customers need to enable these applications.
“Oracle Cloud Infrastructure is able to support deterministic latencies at the 10μs level at very high message volumes. There is sufficient evidence to justify exploring deployment of low-latency sensitive applications to OCI. This is significant because services requiring this service level avoid expensive on-site deployments.”
—Larry Ryan, Chief Technical Officer, BJSS
Visual effects rendering
High performance computing provides the horsepower for today’s omnipresent visual effects, from your favorite blockbuster movie’s special effects to TV ads and the latest PC and console game titles, all are developed by media companies who need HPC and GPU performance on-demand. NVIDIA Quadro Virtual Workstation on OCI’s performance is consistent with expensive high-end graphics workstations, but with Oracle you can access this performance for a few dollars an hour. Below we are sharing some the SPECviewperf 13 benchmark results. Try this out for yourself by provisioning a GPU in the Oracle Cloud and running the benchmark available.
To give user’s a sense of how this performs we ran the SPECviewperf® 13 benchmark on our VM.2.1 GPU shape, which provides one NVIDIA P100 GPU, and compared it to a workstation powered by a P2000. The SPECviewperf® 13 benchmark is the worldwide standard for measuring graphics performance based on professional applications.
“With Oracle Cloud Infrastructure, there’s no need to queue requests or schedule renderings. Our customers can access an unlimited number of machines whenever they need them, without having to pay for unused capacity when they don’t.”
—Mark Ross, Cofounder, GridMarkets
“Around the globe, virtualization is helping enterprises stay productive during these challenging times. With Quadro Virtual Workstations on Oracle Cloud, creative and technical professionals can easily access the performance they need to work anywhere.”
—Anne Hecht, Senior Director of virtualization product marketing, NVIDIA
Oracle Cloud Infrastructure’s supercomputing platform gives researchers access to bare metal NVIDIA GPUs, high performance computing instances, and a low-latency clustered network. Researchers can create clusters for running large-scale computations to accelerate the research in multiple branches of science and engineering like drug discovery, genomics, weather forecasting, space exploration, and more. Through programs like Oracle for Research, Oracle is working closely with research organizations like the University of Bristol and the Royal Holloway University of London to help accelerate the development of vaccines and advanced solutions that address climate change.
“We can simulate carbon capture sequestration scenarios, address complex environmental problems, and drive meaningful change in the world. Oracle has helped us break the barrier of computational power in the lab.&rdquol
—Professor Hier-Majumder, Royal Holloway, University of London.
Accelerating medical research to deliver potential candidates for diseases can be accomplished with HPC solutions that can burst, scale, and be responsive to researcher needs. University of Bristol in the UK uses Oracle HPC solutions to analyze imaging data for medical research.
“Our ambition is to create a platform to react quickly to disease, which involves the creation of terabytes of imaging data. Using Oracle Cloud, we can distribute the data across multiple processors and get results in a fraction of the time of a traditional on-premise system.”
—Imre Berger, Professor of Biochemistry and Chemistry, University of Bristol
Data throughput is extremely important for HPC applications to operate efficiently and to enable data sharing across the compute cluster. Loading and storing massive data sets during processing require a file system that can respond to requests extremely fast and reliably, with consistent, linear responsiveness. Oracle Cloud delivers multiple HPC file system models, including GlusterFS, BeeGFS, Lustre, and IBM Spectrum Scale high- performance file systems with stable, high-speed throughput.
“My team has tested SAS Grid on many public clouds. We are happy to say that Oracle Cloud’s infrastructure provides the I/O throughput to the IBM Spectrum Scale shared file system that is needed for SAS Grid.”
—Margaret Crevar, Senior Manager, SAS Performance Lab, SAS
“Oracle’s bare metal compute and cluster networking technologies allowed BeeGFS on Oracle Cloud to outperform our on-premises HPC file system latency and throughput for MPAS workloads at a very low price point. Using Oracle Cloud’s RDMA cluster networking, BeeGFS can see performance of up to 140 GB/s with as little as 14 servers.”
—Simon Ponsford, Chief Technical Officer, YellowDog
HPC services on Oracle Cloud
High core frequencies and cluster networking give Oracle’s bare metal compute instances significant performance improvements over other public clouds and onsite data centers. Bare metal compute instances provide exceptional isolation, visibility, and control.
While our standard bare metal servers include dual 25 Gbps Ethernet for fast networking, Oracle’s groundbreaking back-end network fabric uses Mellanox’s ConnectX-5, 100 Gbps network interface cards with to provide RDMA over converged Ethernet (RoCE) v2, creating clusters with the same low-latency networking and application scalability you expect from your on-premises clusters.
Oracle Cloud has high performance virtual machine and bare metal NVIDIA-based options for graphics-intensive workloads for rendering, AI, and deep learning workloads.
Oracle Linux for High Performance Computing combines a fully supported, open and complete operating environment that is 100% application binary-compatible with Red Hat Enterprise Linux. Oracle Linux delivers virtualization, management, and cloud native computing tools—along with the Linux operating system (OS)—in a single offering that meets high performance computing requirements. Customers running HPC on Oracle Linux in Oracle Cloud are seeing impressive performance gains with no sacrifices, or support costs. With crash simulation and CFD workloads Oracle Linux provides 4-6% improvement in simulation times.
Traditional storage simply can’t provide enough throughput for performance-intensive workloads that process large volumes of data quickly. To meet these needs, Oracle makes it easy to deploy GlusterFS, BeeGFS, Lustre, and IBM Spectrum Scale high performance file systems that can deliver up to 453 GBps aggregate throughput to HPC clusters.
Ready to deploy HPC solutions
Easy, automated, cluster deployment
Deploy clusters quickly and easily with an Oracle Cloud Marketplace stack (Terraform template) that includes all the key components to up and running quickly. The stack provides ability to install the Slurm scheduler, OpenMPI, and tools to test MPI connectivity.
Easy file system deployment
Oracle makes it easy to deploy industry-leading high-performance file servers at petabyte scale with Oracle Cloud Marketplace stacks, which include automation rooted in best practices to reduce complexity and time to deployment. With just a few clicks, file systems can be up and running in less than 15 minutes. Oracle Cloud Marketplace includes an easy-to-deploy stack that covers BeeGFS, Lustre, and GlusterFS, as well as additional customizable stacks for each individual file system.
VMs for data science
Oracle Cloud Infrastructure Virtual Machines for Data Science are preconfigured environments that enable you to build models and deliver business value faster. They offer exceptional performance, security, and control. You can expand your compute resources as needed using compute autoscaling and keep costs under control by stopping compute instances when they are not needed.
You can have a virtual machine with an NVIDIA GPU up and running in less than 15 minutes with preinstalled common IDEs, notebooks, and frameworks. Oracle Cloud Infrastructure VMs for Data Science include basic sample data and code for you to test and explore.
Oracle Cloud HPC price-performance
We designed our HPC instances for the most computationally intensive workloads requiring the fastest single threaded performance and the lowest latency network. All HPC instances have a unique direct memory interconnect network powered by a non-virtualized and bare-metal RDMA network. We deliver high-frequency processors, fast and dense local storage, and an RDMA cluster network with < 2 microsecond latency across clusters of tens of thousands of cores. AWS doesn’t offer this architecture, and their closest solution, the C5n, is significantly more expensive.
|Oracle Cloud Infrastructure BM.HPC2.36||AWS c5n.metal|
|List price||$2.70||$3.888 (US East)|
|Storage||Local NVME SSDs||No Local NVMe SSD|
|SPECrate 2017 Integer||238||237|
|SPECrate 2017 Floating Point||206||206|
|Summary1 2 3||Lower costs for better performance with RDMA and performance guarantee||44% more expensive, no local SSD storage, half the RAM, with no RDMA and no performance SLA|
“Oracle Cloud Infrastructure and Rocky DEM have collaborated to provide a scalable experience to customers with performance similar to on-premises clusters. The bare metal NVIDIA GPU servers, without hypervisor overhead, further help to tackle very large problems in a reasonable amount of time.”
—Marcus Reis, Vice President of ESSS