AI Solution

NVIDIA NIM inference microservice at scale with OCI Container Engine for Kubernetes

Introduction

How can you deliver inference requests at scale for your large language model and accelerate your AI deployment? By deploying the enterprise-ready solution NVIDIA NIM on Oracle Cloud Infrastructure (OCI) Container Engine for Kubernetes (OKE). In this demo, we’ll show how to deploy NVIDIA NIM on OKE with the model repository hosted on OCI Object Storage. Using a Helm deployment, easily scale the number of replicas up and down depending on the number of inference requests, plus get easy monitoring. Leverage OCI Object Storage to deploy models from anywhere, with support for various types of models. Powered by NVIDIA GPUs, take full advantage of NIM to help you get the maximum throughput and minimum latency for your inference requests.

Prerequisites and setup

  1. Oracle Cloud account—sign-up page
  2. Access to VM.GPU.A10.1 powered by a single NVIDIA A10 Tensor Core GPU—service limits
  3. Instance principals—documentation
  4. NVIDIA AI Enterprise, part of the OCI Marketplace—documentation
  5. HuggingFace with user access tokens—documentation
  6. OCI Container Engine for Kubernetes—documentation

注:为免疑义,本网页所用以下术语专指以下含义:

  1. Oracle专指Oracle境外公司而非甲骨文中国。
  2. 相关Cloud或云术语均指代Oracle境外公司提供的云技术或其解决方案。