Simplify operations of enterprise-grade Kubernetes at scale. Easily deploy and manage resource-intensive workloads such as AI with automatic scaling, patching, and upgrades.
On December 11, learn how to accelerate development and simplify managing AI workloads in production.
Learn how to accelerate development and simplify managing AI workloads in production.
OKE is the lowest cost Kubernetes service amongst all hyperscalers, especially for serverless.
OKE automatically adjusts compute resources based on demand, which can reduce your costs.
GPUs can be scarce, but OKE job scheduling makes it easy to maximize resource utilization.
OKE is consistent across clouds and on-premises, enabling portability and avoiding vendor lock-in.
OKE reduces the time and cost needed to manage the complexities of Kubernetes infrastructure.
Automatic upgrades and security patching boost reliability for the control plane and worker nodes.
Fully automated, native cross-region recovery is available using OCI Full Stack Disaster Recovery.
OCI Kubernetes Engine (OKE) is certified by the Cloud Native Computing Foundation (CNCF) for both Kubernetes Platform and Kubernetes AI Platform conformance .
These certifications validate OKE’s commitment to open standards—helping ensure that your cloud native and AI/ML workloads run on a platform that’s fully aligned with the industry’s best practices and interoperable across the global Kubernetes ecosystem.
Read more OCI’s new AI Conformance certification.
Kubernetes is the go-to platform to deploy AI workloads. OKE powers Oracle Cloud Infrastructure (OCI) AI services.
– The initial build stage of an AI project involves defining the problem and preparing data to create models.
– Kubernetes clusters can significantly improve efficiency by granting shared access to expensive and often limited GPU resources while providing secure and centrally managed environments.
– Kubeflow, a Kubernetes-related open source project, provides a comprehensive framework designed to streamline the building, training, and deployment of models.
OKE is built on top of OCI, offering a complete stack of high performance infrastructure designed for AI/ML workloads such as:
– The full range of NVIDIA GPUs including H100, A100, A10, etc.
– Ultrafast RDMA networks
Using OKE self-managed nodes, you can run AI/ML building workloads on your Kubernetes clusters.
Kubernetes is the go-to platform to deploy AI workloads. OKE powers OCI AI services.
– In model training, data scientists select an algorithm and initiate training jobs using prepared data. This stage requires sophisticated scheduling systems to handle the jobs efficiently.
– Kubernetes projects such as Volcano and Kueue help handle such requirements and make efficient use of compute resources.
– Large-scale distributed training requires low-latency internode communications in the cluster. This is where a specialized ultrafast network with remote direct memory access (RDMA) is needed. It enables data to be moved directly to or from an application’s memory, bypassing the CPU to reduce latency.
OKE is built on top of OCI, offering a complete stack of high performance infrastructure designed for AI/ML workloads such as:
– The full range of NVIDIA GPUs including H100, A100, A10, etc.
– Low-latency, ultra-high performance RDMA networks
Using OKE self-managed nodes, you can run AI/ML training on your Kubernetes clusters.
Kubernetes is the go-to platform to deploy AI workloads. OKE powers OCI AI services.
– AI model inferencing is where Kubernetes really shines. Kubernetes can automatically scale the number of inference pods up or down based on demand, ensuring efficient use of resources.
– Kubernetes provides sophisticated resource management, including the ability to specify CPU and memory limits for containers.
OKE is designed with resilience at its core, leveraging Kubernetes’ built-in pod autoscaling to scale worker nodes based on usage. Worker nodes can be distributed across multiple fault and/or availability domains for high availability.
OKE virtual nodes provide a serverless Kubernetes experience. They only need to scale at the pod level, without ever scaling worker nodes. This allows for quicker scaling and more economical management since service fees are based solely on the pods in use.
Virtual nodes are well-suited for inference workloads and can use Arm processors, which are becoming a much more attractive option for AI inference—especially when GPUs are in short supply.
OKE offers lower total cost of ownership and improved time to market.
OKE simplifies operations at scale in the following ways:
Future-proof your applications with an OKE-centric microservices architecture.
Kubernetes is an open source platform for managing and scaling clusters of containerized applications and services.
Kubernetes is an open source platform for managing and scaling clusters of containerized applications and services.
Customers choose OKE because it delivers the results—and reliability—they need to run and grow their business.
