We’re sorry. We could not find a match for your search.

We suggest you try the following to help find what you're looking for:

  • Check the spelling of your keyword search.
  • Use synonyms for the keyword you typed, for example, try “application” instead of “software.”
  • Start a new search.
Contact Sales Sign in to Oracle Cloud

GPUs for AI Innovators

OCI Compute enables machine learning (ML) engineers to build next-generation AI models for any workload. The combination of NVIDIA graphics processing units (GPUs) with up to 640 GB of memory, nonblocking remote direct memory access (RDMA) networks with less than two microseconds latency, and locally attached solid-state drives provide superior price-performance and scalability for AI training and inferencing.

Oracle CloudWorld: Conversation between Oracle CEO Safra Catz and NVIDIA CEO Jensen Huang (9:58)

Talk with Oracle about accelerating your GPU workloads.

WEKA on OCI delivers 2 terabytes per second performance.

See how OCI and NVIDIA power next-generation AI models

Customers such as Adept, an ML research and product lab developing a universal AI teammate, are using the power of OCI and NVIDIA technologies to build the next generation of AI models. Running thousands of NVIDIA GPUs on clusters of OCI bare metal compute instances and capitalizing on OCI’s network bandwidth, Adept can train large-scale AI and ML models faster and more economically than before.

Adept builds a powerful AI teammate for everyone with Oracle and NVIDIA

“With the scalability and computing power of OCI and NVIDIA technology, we are training a neural network to use every software application, website, and API in existence—building on the capabilities that software makers have already created.”

David Luan, CEO
Adept

Aleph Alpha builds self-learning artificial intelligence on OCI

“This is a new generation model, and in order to train those you need a new generation of hardware—the old GPU clusters aren’t sufficient anymore. On the industry side we have raised a lot of capital and partnered with Oracle. We’re building a way to translate an impressive playground task into an enterprise application that creates value.”

Jonas Andrulis, Founder and CEO
Aleph Alpha

“We selected Oracle because of the affordability and performance of the GPUs combined with Oracle’s extensive cloud footprint. GPUs are very important for training deep neural network models. The higher the GPU performance, the better our models. And because we work in several different countries and regions, we needed the infrastructure to support that.”

Nils Helset, Cofounder and CEO
DigiFarm

VizSeek powers visual search with Oracle

“Our software is a visual search engine, and people expect it to be fast, they expect it to be reliable, and we also need to be able to scale for our customers and make sure our platform is secure. So, this type of architecture really meets all those requirements. We’re able to debug and scale without any error or interruption.”

Rob Hill, Chief Architect
VizSeek

“We moved our hosting to Oracle Cloud using managed Kubernetes and saved 40% of our hosting costs. We reinvested that cost savings into GPU shapes and have been able to deliver even more advanced computer vision technology to our customers. Oracle Cloud is instrumental in helping us scale and innovate.”

Jenny Griffiths, Founder and CEO
Snap Vision

Explore how OCI supports model training and parallel applications

High-speed RDMA cluster networks

High performance computing on Oracle Cloud Infrastructure provides powerful, cost-effective computing capabilities to solve complex mathematical and scientific problems across industries.

OCI's bare metal servers coupled with Oracle’s cluster networking provide access to ultralow-latency (less than 2 microseconds across clusters of tens of thousands of cores) RDMA over converged ethernet (RoCE) v2.

The chart shows the performance of Oracle’s cluster networking fabric. OCI can scale above 100% below 10,000 simulation cells per core with popular CFD codes, the same performance that you would see on-premises. It’s important to note that without the penalty of virtualization, bare metal HPC machines can use all the cores on the node without having to reserve any cores for costly overhead.

High performance computing (HPC) on OCI

HPC on OCI rivals the performance of on-premises solutions with the elasticity and consumption-based costs of the cloud, offering on-demand potential to scale tens of thousands of cores simultaneously.

With HPC on OCI, you get access to high-frequency processors; fast and dense local storage; high-throughput, ultralow-latency RDMA cluster networks; and the tools to automate and run jobs seamlessly.

OCI can provide latencies as low as 1.7 microseconds—lower than any other cloud vendor, according to an analysis by Exabyte.io. By enabling RDMA-connected clusters, OCI has expanded cluster networking for bare metal servers equipped with NVIDIA A100 GPUs.

The groundbreaking backend network fabric lets customers use Mellanox’s ConnectX-5 100 Gb/sec network interface cards with RDMA over converged Ethernet (RoCE) v2 to create clusters with the same low-latency networking and application scalability that can be achieved on-premises.

Unique bare metal GPU clusters

OCI’s bare metal NVIDIA GPU instances offer startups a high performance computing platform for applications that rely on machine learning, image processing, and massively parallel high performance computing jobs. GPU instances are ideally suited for model training, inference computation, physics and image rendering, and massively parallel applications.

The BM.GPU4.8 instances have eight NVIDIA A100 GPUs and use Oracle’s low-latency cluster networking, based on remote direct memory access (RDMA) running over converged Ethernet (RoCE) with less than 2-microsecond latency. Customers can now host more than 500 GPU clusters and easily scale on demand.

What’s included with GPU instances on OCI?

Dedicated engineering support

OCI provides world-class technical experts to help you get up and running. We remove the technical barriers of a complex deployment—from planning to launch—to help ensure your success.

  • Solution architecture development
  • Networking, security, and auditing
  • Onboarding to OCI
  • Application migration
  • Post-migration training

Improved economics

OCI is built for enterprises seeking higher performance, consistently lower costs, and easier cloud migration for their current on-premises applications. When compared to AWS, OCI offers

  • Private network connectivity that costs 74% less
  • More than 3X better price-performance for compute
  • Up to 44% less expensive infrastructure with local solid-state disks, twice the RAM, RDMA networking, and a performance SLA
  • 20X the input/output operations per second for less than half the cost
October 19, 2022

Faster large model training with NVIDIA A100 80GB Tensor Core GPU

Andrew Butterfield, Principal Product Manager, GPU and HPC, Oracle

We’re excited to announce the launch of the Oracle Cloud Infrastructure GM4 instance powered by the NVIDIA A100 80GB Tensor Core GPU. This new shape continues to fuel the explosive growth in large AI model training and inferencing by combining it with OCI’s high-throughput, ultra-low latency RDMA cluster networking, which lets customers create large-scale clusters of NVIDIA A100 GPUs.

Read the complete post

Additional cloud architecture and deployment resources

OCI Cloud Adoption Framework (CAF)

IDC’s view on OCI and hybrid cloud

Omdia’s perspective on why all clouds are not the same

OCI for the modern enterprise