AI infrastructure

Run the most demanding AI workloads faster, including generative AI, computer vision, and predictive analytics, anywhere in our distributed cloud. Get the latest GPU compute, scaling up to the 32,768 GPU Oracle Cloud Infrastructure (OCI) Supercluster.

Everyone can generate beautiful music with Suno AI and OCI (2:06)

Why run on OCI AI infrastructure?

Leading AI performance and value

OCI AI infrastructure provides the highest-tier performance and value for all AI workloads—including inferencing, training, and AI assistants.

Scale up to 32,768 GPUs

Only OCI Supercluster offers industry-leading scale with bare metal compute so you can accelerate training for trillion-parameter AI models.

Enable sovereign AI

Oracle’s distributed cloud enables you to deploy AI infrastructure anywhere to help meet performance, security and AI sovereignty requirements.

AI innovators leverage OCI to host, train, and inference next-generation AI models.

Explore OCI Supercluster for large-scale AI training

Overview

OCI Supercluster enables you to deploy up to an industry-leading 32,768 GPUs per cluster, leveraging RDMA cluster networking and local storage to achieve rapid training and inferencing on large-scale AI models.

OCI Supercluster diagram showing bare metal compute powered by NVIDIA GPUs, storage options (such as block, object, and file storage), and cluster networking consisting of RDMA over Converged Ethernet providing microsecond latency and 1.6 Gb/sec bandwidth.

Storage for Supercluster

Through OCI Supercluster, customers can access local, block, object, and file storage for exascale computing. Among major cloud providers, OCI offers the highest capacity of high performance local NVMe storage for more frequent checkpointing during training runs, resulting in faster recovery from failures.

HPC file systems, including BeeGFS, GlusterFS, Lustre, and WEKA, can be used for AI training at scale without compromising performance.

Networking for Supercluster

High-speed RDMA cluster networking powered by Mellanox’s ConnectX-5 100 Gb/sec network interface cards with RDMA over Converged Ethernet v2 allows you to create large clusters of GPU instances with the same ultralow-latency networking and application scalability you expect on-premises.

You don’t pay extra for RDMA capability, block storage, or network bandwidth, and the first 10 TB of egress is free.

Networking for Supercluster
Enlarge+
The diagram shows A Supercluster RDMA network with eight NVIDIA A100 GPUs per node connected through a full-duplex network fabric with a total of 1.6 Tb/sec internode bandwidth.

Compute for Supercluster

OCI bare metal instances powered by NVIDIA H100 and A100 GPUs enable customers to run large AI models, such as deep learning, conversational AI, and generative AI. With Supercluster, customers can scale up to 32,768 A100 GPUs per cluster.

The diagram shows GPU cluster nodes powered by NVIDIA A100 GPUs and networking with less than two-microsecond latency.

How OCI Supercluster works

Watch Chief Technical Architect Pradeep Vincent explain how OCI Supercluster powers the training and inferencing of machine learning models, scaling to tens of thousands of NVIDIA GPUs.

Typical AI infrastructure use cases

Train AI models on OCI bare metal instances powered by GPUs, RDMA cluster networking, and OCI Data Science.


Deep learning training and inferencing diagram, description below
Train AI models on OCI bare metal instances powered by GPUs, RDMA cluster networking, and OCI Data Science.

Protecting the billions of financial transactions that happen every day requires enhanced AI tools that can analyze large amounts of historical customer data. AI models running on OCI Compute powered by NVIDIA GPUs along with model management tools such as OCI Data Science and other open source models help financial institutions mitigate fraud.


Fraud detection augmented by AI diagram, description below
AI models running on OCI Compute powered by NVIDIA GPUs along with model management tools such as OCI Data Science and other open source models help financial institutions mitigate fraud.

AI is often used to analyze various types of medical images (such as X-rays and MRIs) in a hospital. Trained models can help prioritize cases that need immediate review by a radiologist and report conclusive results on others.


AI-based medical image analysis diagram, description below
Trained models running on OCI Compute powered by GPUs can help analyze medical images and provide immediate conclusive results or prioritize images for further review.

Drug discovery is a time consuming and expensive process that can take many years and cost millions of dollars. By leveraging AI infrastructure and analytics, researchers can accelerate drug discovery. Additionally, OCI Compute powered by NVIDIA GPUs along with AI workflow management tools such as BioNeMo enables customers to curate and preprocess their data.


Using AI to accelerate drug discovery, description below
Leveraging AI Infrastructure and analytics, researchers can accelerate drug discovery and curate and preprocess their data.

AI infrastructure customer successes

Explore more customer stories

Get started with OCI AI infrastructure

Try Oracle AI and get a 30-day trial

Oracle offers a free pricing tier for most AI services as well as a free trial account with US$300 in credits to try additional cloud services. AI services are a collection of offerings, including generative AI, with prebuilt machine learning models that make it easier for developers to apply AI to applications and business operations.

  • Which Oracle AI and ML services offer a free pricing tier?

    • OCI Speech
    • OCI Language
    • OCI Vision
    • OCI Document Understanding
    • Machine Learning in Oracle Database
    • OCI Data Labeling

    You also only have to pay compute and storage charges for OCI Data Science.

Additional resources

Learn more about RDMA cluster networking, GPU instances, bare metal servers, and more.

See how much you can save with OCI

Oracle Cloud pricing is simple, with consistent low pricing worldwide, supporting a wide range of use cases. To estimate your low rate, check out the cost estimator and configure the services to suit your needs.

Experience the difference

  • 1/4 the outbound bandwidth costs
  • 3X the compute price-performance
  • Same low price in every region
  • Low pricing without long term commitments

Access AI subject matter experts

Get help with building your next AI solution or deploying your workload on OCI AI infrastructure.

  • They can answer questions such as

    • How do I get started with Oracle Cloud?
    • What kinds of AI workloads can I run on OCI?
    • What types of AI services does OCI offer?