Deploy, scale, and monitor GenAI workloads in minutes with Oracle Cloud Infrastructure (OCI) AI Blueprints. Get prepackaged, OCI-verified deployment blueprints, complete with hardware recommendations, software components, and out-of-the-box monitoring.
Ease AI workload deployment concerns to scale deployments, determine driver and application compatibility, and manage observability and management decisions with blueprints built on OCI-verified best practices.
Deploy and monitor your mission-critical GenAI workloads in minutes with blueprints that include verified hardware, software, and out-of-the-box monitoring.
Adopt prebuilt connections to third-party observability applications, such as Prometheus, Grafana, and MLflow to ease monitoring and observability concerns across AI workloads.
Simplify the deployment of large language models (LLMs) and vision language models (VLMs) using an open source interface engine called virtual large language model (vLLM). Deploy a custom model or select from a variety of open models on Hugging Face.
Streamline infrastructure benchmarking for fine-tuning using the MLCommons methodology. It fine-tunes a quantized Llama-2-70B model with a standard data set.
OCI AI Blueprints enable efficient model tuning using low-rank adaptation (LoRA), a highly efficient method of LLM fine-tuning. Fine-tune a custom LLM or use most open LLMs from Hugging Face.
Prior to deploying production or research workloads, you can use a robust, precheck blueprint for thorough GPU health validation to proactively detect and address issues. Verify that your GPU infrastructure is primed for high-demand experiments across both single- and multi-node environments.
Adopt a comprehensive framework for serving LLMs on CPUs using the Ollama platform with a variety of supported models, such as Mistral, Gemma, and others.
With this blueprint, you can distribute inference serving across several computing nodes, each typically equipped with one or more GPUs. For example, deploy Llama 405B–sized LLMs across multiple H100 nodes with RDMA using vLLM and LeaderWorkerSet.
Serve LLMs with autoscaling using KEDA, which scales to multiple GPUs and nodes using application metrics, such as inference latency.
Deploy LLMs to a fraction of a GPU with NVIDIA’s multi-instance GPUs and serve them with a vLLM.
Get your AI application running quickly and efficiently with opinionated hardware recommendations, prepackaged software stacks, and out-of-the-box observability tooling.
Deploy your GenAI workloads with confidence using prepackaged blueprints tested on recommended OCI GPU, CPU, and networking configurations, saving you from time-consuming performance benchmarking and guesswork.
Adopt the necessary frameworks, libraries, and model configurations for popular AI use cases, such as RAG, fine-tuning, and inference, or customize use cases for your business needs.
Get simplified infrastructure management with automated MLOps tasks, including monitoring, logging, and scaling. Get started quickly with preinstalled tools, such as Prometheus, Grafana, MLflow, and KEDA, to get a production-grade environment with minimal effort.
Introducing OCI AI Blueprints, an AI workload Kubernetes management platform with a set of blueprints that can help you deploy, scale, and monitor AI workloads in production in minutes.
Read the complete postTry 20+ Always Free cloud services, with a 30-day trial for even more.
Explore OCI AI Blueprints and try them out or deploy them in your production tenancy.
See how Oracle enables customers to consistently save on compute, storage, and networking compared with other cloud hyperscalers.
Interested in learning more about Oracle Cloud Infrastructure? Let one of our experts help.