AI Solution

NVIDIA NIM inference microservice at scale with OCI Kubernetes Engine

Introduction

How can you deliver inference requests at scale for your large language model and accelerate your AI deployment? By deploying the enterprise-ready solution NVIDIA NIM on Oracle Cloud Infrastructure (OCI) Kubernetes Engine (OKE). In this demo, we’ll show how to deploy NVIDIA NIM on OKE with the model repository hosted on OCI Object Storage. Using a Helm deployment, easily scale the number of replicas up and down depending on the number of inference requests, plus get easy monitoring. Leverage OCI Object Storage to deploy models from anywhere, with support for various types of models. Powered by NVIDIA GPUs, take full advantage of NIM to help you get the maximum throughput and minimum latency for your inference requests.

Demo

Demo: NVIDIA NIM inference microservice at scale with OCI Kubernetes Engine (1:18)

Prerequisites and setup

  1. Oracle Cloud account—sign-up page
  2. Access to VM.GPU.A10.1 powered by a single NVIDIA A10 Tensor Core GPU—service limits
  3. Instance principals—documentation
  4. NVIDIA AI Enterprise, part of the OCI Marketplace—documentation
  5. HuggingFace with user access tokens—documentation
  6. OCI Kubernetes Engine—documentation