Large language models (LLMs) have made significant strides in text generation, problem-solving, and following instructions. As businesses use LLMs to develop cutting-edge solutions, the need for scalable, secure, and efficient deployment platforms becomes increasingly important. Kubernetes has become the preferred option for its scalability, flexibility, portability, and resilience.
In this demo, we demonstrate how to deploy fine-tuned LLM inference containers on Oracle Cloud Infrastructure Kubernetes Engine (OKE), a managed Kubernetes service that simplifies deployments and operations at scale for enterprises. The service enables them to retain the custom model and data sets within their own tenancy without relying on a third-party inference API.
We’ll use Text Generation Inference (TGI) as the inference framework to expose the LLMs.