OCI Supercluster and AI Infrastructure

Oracle Cloud Infrastructure (OCI) Supercluster provides ultrafast cluster networking, HPC storage, and OCI Compute bare metal instances. OCI Supercluster is ideal for training generative AI, including conversational applications and diffusion models. With support for up to tens of thousands of NVIDIA GPUs, OCI Compute bare metal instances and VMs can power applications for computer vision, natural language processing, recommendation systems, and more.

Oracle and NVIDIA partner to speed AI adoption for enterprises (2:06)

How OCI Supercluster beats the competition

  • Industry-leading scalability for generative AI

    Deploy up to tens of thousands of GPUs per cluster for much greater scalability than similar offerings from other providers.

  • Ultralow latency and ultrahigh bandwidth*

    Reduce the time needed to train AI with simple Ethernet network architecture that provides ultrahigh performance at massive scale.

  • Access to AI subject matter experts

    Get engineering help with solution architecture, networking, security, auditing, onboarding, application migration, and much more.

* Bandwidth for NVIDIA H100 clusters is 3,200 Gb/sec, and bandwidth for NVIDIA A100 clusters is 1,600 Gb/sec.

Talk with Oracle about accelerating your GPU workloads.

Explore how OCI supports model training and parallel applications

Deploy tens of thousands of NVIDIA H100 and A100 GPUs

Each OCI Compute bare metal instance is connected using OCI’s ultralow-latency cluster networking, which can scale up to tens of thousands of NVIDIA H100 or A100 GPUs in a single cluster. These instances use OCI’s unique high performance network architecture, which leverages RDMA over Converged Ethernet (RoCE) v2 for microseconds of latency between nodes and near line-rate bandwidth.

OCI’s implementation of RoCE v2 provides

  • 1,600 Gb/sec of bandwidth per server and 200 Gb/sec of bandwidth per A100 GPU
  • 3,200 Gb/sec of bandwidth per server and 400 Gb/sec of bandwidth per H100 GPU

High-speed RDMA cluster networks

High performance computing on OCI provides powerful, cost-effective computing capabilities to solve complex mathematical and scientific problems across industries.

The chart shows the performance of Oracle’s cluster networking fabric. Below 10,000 simulation cells per core, OCI can scale above 100% with popular CFD codes, the same performance you would see on-premises. It’s important to note that without the penalty of virtualization, bare metal HPC machines can use all the cores on the node without having to reserve any cores for costly overhead.

High performance computing (HPC) on OCI

HPC on OCI rivals the performance of on-premises solutions with the elasticity and consumption-based costs of the cloud, offering on-demand potential to scale tens of thousands of cores simultaneously. Customers get access to high-frequency processors; fast and dense local storage; high-throughput, ultralow-latency RDMA cluster networks; and the tools to automate and run jobs seamlessly.

OCI can provide latencies as low as 1.7 microseconds—lower than any other cloud vendor, according to an analysis by Exabyte.io. By enabling RDMA-connected clusters, OCI has expanded cluster networking for bare metal servers equipped with NVIDIA H100 and A100 GPUs. The groundbreaking back-end network fabric lets customers create clusters with the same low-latency networking and application scalability that can be achieved on-premises.

Unique bare metal GPU clusters

OCI’s bare metal NVIDIA GPU instances offer startups a high performance computing platform for applications that rely on deep learning, recommendation systems, and massively parallel high performance computing jobs. GPU instances are ideally suited for model training, inference computation, physics and image rendering, and massively parallel applications.

OCI offers instances with eight NVIDIA H100 or NVIDIA A100 GPUs. While OCI Supercluster provides the ability to scale up to hundreds or thousands of GPUs per cluster, OCI also offers the capability to deploy at a much smaller scale, starting with just a single GPU.

OCI at NVIDIA GTC, the conference for AI and the metaverse

See how OCI and NVIDIA power next-generation AI models

Customers such as Adept, an ML research and product lab developing a universal AI teammate, are using the power of OCI and NVIDIA technologies to build the next generation of AI models. Running thousands of NVIDIA GPUs on clusters of OCI bare metal compute instances and capitalizing on OCI’s network bandwidth, Adept can train large-scale AI and ML models faster and more economically than before.

OCI utilized by Microsoft for Bing conversational search

“Our collaboration with Oracle and use of Oracle Cloud Infrastructure along with our Microsoft Azure AI infrastructure, will expand access to customers and improve the speed of many of our search results.”

Divya Kumar, Global Head of Marketing for Search and AI

Adept builds a powerful AI teammate for everyone with Oracle and NVIDIA

“With the scalability and computing power of OCI and NVIDIA technology, we are training a neural network to use every software application, website, and API in existence—building on the capabilities that software makers have already created.”

David Luan, CEO

MosaicML scales AI/ML training on OCI

Learn why MosaicML found that OCI is the best foundation for AI training.

SoundHound selects OCI to support huge company growth

“We view this relationship with OCI as long term. We’re excited about taking advantage of the GPUs and using that to train our next generation of voice AI. There's a lot that we think that OCI will provide for us in terms of future growth.”

James Hom, Cofounder and Vice President of Products

Emory University uses Oracle Cloud to help fight Parkinson’s disease

“With Oracle Cloud, we’re running between four and eight GPUs in parallel to vastly accelerate our research progress, meaning we can complete an experiment in just a few hours.”

Hyeokhyen Kwon, Assistant Professor, Biomedical Informatics
Emory University

Softdrive offers next-generation workstations with OCI Compute and NVIDIA A10

“Softdrive is the future of business computers. In the cloud PC market, performance means everything. NVIDIA GPUs on OCI bare metal servers have dramatically improved the experience for our customers.”

Leonard Ivey, Cofounder

University of Michigan improves AI text summaries of academic journals

Researchers used high performance virtual machines and remote NVIDIA A100 Tensor Core GPUs, which proved effective for running the team’s memory-hungry summarization algorithms.

What’s included with GPU instances on OCI?

Dedicated engineering support

OCI provides world-class technical experts to help you get up and running. We remove the technical barriers of a complex deployment—from planning to launch—to help ensure your success.

  • Solution architecture development
  • Networking, security, and auditing
  • Onboarding to OCI
  • Application migration
  • Post-migration training

Improved economics

OCI is built for enterprises seeking higher performance, consistently lower costs, and easier cloud migration for their current on-premises applications.

  • Private network connectivity that costs 74% less
  • More than 3X better price-performance for compute
  • Up to 44% less expensive infrastructure with local solid-state disks, twice the RAM, RDMA networking, and a performance SLA
  • 20X the input/output operations per second for less than half the cost
November 13, 2023

Announcing plans to offer NVIDIA Grace Hopper Superchip on OCI

Sagar Rawal, Vice President, Oracle Cloud Infrastructure

Today at SC23, we’re announcing our upcoming plans to offer Oracle Cloud Infrastructure (OCI) Compute instances powered by the NVIDIA GH200 Grace Hopper Superchip. The GH200 consists of an Arm CPU (Grace) linked to an NVIDIA H100 Tensor Core GPU (Hopper) with a high-bandwidth memory space of 576 GB.

Read the complete post

Additional cloud architecture and deployment resources

OCI Cloud Adoption Framework (CAF)

Omdia’s perspective on why all clouds are not the same