AI Solution

Deploying LLMs Using Hugging Face and Kubernetes on OCI

AI solution topics

Introduction
Demo
Prerequisites and setup
Getting started

Introduction

Large language models (LLMs) have made significant strides in text generation, problem-solving, and following instructions. As businesses use LLMs to develop cutting-edge solutions, the need for scalable, secure, and efficient deployment platforms becomes increasingly important. Kubernetes has become the preferred option for its scalability, flexibility, portability, and resilience.

In this demo, we demonstrate how to deploy fine-tuned LLM inference containers on Oracle Cloud Infrastructure Kubernetes Engine (OKE), a managed Kubernetes service that simplifies deployments and operations at scale for enterprises. The service enables them to retain the custom model and data sets within their own tenancy without relying on a third-party inference API.

We’ll use Text Generation Inference (TGI) as the inference framework to expose the LLMs.

Demo

Prerequisites and setup

Oracle Cloud account—sign-up page
Oracle Cloud Infrastructure—documentation
OCI Generative AI—documentation
OCI Kubernetes Engine—documentation

Getting started

Detailed steps and sample code on GitHub