AI Solution

Deploying LLMs Using Hugging Face and Kubernetes on OCI

Introduction

Large language models (LLMs) have made significant strides in text generation, problem-solving, and following instructions. As businesses use LLMs to develop cutting-edge solutions, the need for scalable, secure, and efficient deployment platforms becomes increasingly important. Kubernetes has become the preferred option for its scalability, flexibility, portability, and resilience.

In this demo, we demonstrate how to deploy fine-tuned LLM inference containers on Oracle Cloud Infrastructure Container Engine for Kubernetes (OKE), a managed Kubernetes service that simplifies deployments and operations at scale for enterprises. The service enables them to retain the custom model and data sets within their own tenancy without relying on a third-party inference API.

We’ll use Text Generation Inference (TGI) as the inference framework to expose the LLMs.

Demo

Demo: Deploying LLMs Using Hugging Face and Kubernetes on OCI (1:30)

Prerequisites and setup

  1. Oracle Cloud account—sign-up page
  2. Oracle Cloud Infrastructure—documentation
  3. OCI Generative AI—documentation
  4. OCI Container Engine for Kubernetes—documentation