AI Solution

NVIDIA NIM inference microservice at scale with OCI Kubernetes Engine

AI solution topics

Introduction
Demo
Prerequisites and setup
Getting started

Introduction

How can you deliver inference requests at scale for your large language model and accelerate your AI deployment? By deploying the enterprise-ready solution NVIDIA NIM on Oracle Cloud Infrastructure (OCI) Kubernetes Engine (OKE). In this demo, we’ll show how to deploy NVIDIA NIM on OKE with the model repository hosted on OCI Object Storage. Using a Helm deployment, easily scale the number of replicas up and down depending on the number of inference requests, plus get easy monitoring. Leverage OCI Object Storage to deploy models from anywhere, with support for various types of models. Powered by NVIDIA GPUs, take full advantage of NIM to help you get the maximum throughput and minimum latency for your inference requests.

Demo

Prerequisites and setup

Oracle Cloud account—sign-up page
Access to VM.GPU.A10.1 powered by a single NVIDIA A10 Tensor Core GPU—service limits
Instance principals—documentation
NVIDIA AI Enterprise, part of the OCI Marketplace—documentation
HuggingFace with user access tokens—documentation
OCI Kubernetes Engine—documentation

Getting started

Detailed steps and sample code on GitHub