Oracle Private AI Services Container

The Oracle Private AI Services Container is a robust, secure AI infrastructure that helps regulated organizations, and those with stringent security requirements, address common compliance and auditability challenges of AI model deployments. They can run private instances of AI models and avoid sharing data with third-party AI providers. The solution also mitigates performance bottlenecks, allowing customers to securely offload compute-intensive AI tasks—such as vector embedding generation—outside the database, helping keep all data secure within their environment. The container can be deployed within the customer’s tenancy in the public cloud, on private clouds, or on-premises, including in air-gapped environments.

Why Oracle Private AI Services Container?

  • Database Offload

    Offload expensive AI computation, such as vector embedding generation. This computation offload can free up compute resources on the Oracle AI Database server.

  • Secure AI Infrastructure

    Secure local AI inference is enabled by applying security best practices and leveraging industry-standard technologies like TLS.

  • Simple REST API

    Developers use the popular OpenAI API REST protocol for AI operations. Existing OpenAI SDK clients can also use the Private AI Services Container.

  • No GPU Required

    No specialized hardware is required. The container can run on a system with just a single x8664 CPU and 16 GB of RAM.

Key Features of Oracle Private AI Services Container

REST APIs

A REST API is used to enable clients to perform common AI operations. The REST API implements the OpenAI API. Existing OpenAI SDK clients can work with the container, you just need to change the REST endpoint and API key.

Security and Privacy

All REST traffic is secured with TLS 1.3. Passwords are encrypted and kept in a PKCS12 keystore. Authentication and authorization use API keys. SE Linux is enabled. User data is not stored in the container.

Air Gap Enabled

The Private AI Services Container is a web service which is designed to be deployed in your data center. There is no dependency on public clouds or the internet. This design enables the container to run in air gapped environments.

Embedding Models

Six popular embedding models ship with the container. This means that customers can have a system that can be deployed without the extra effort of creating and download other embedding models. If needed, customers can also create and download additional embedding models.

Containerized

Podman, Kubernetes or OpenShift can be used to manage the container. The container is stateless which enables horizontal scaling and high availability. For small deployments, Podman can be used. For larger deployments, Kubernetes or OpenShift should be used.

Multi-threaded

The ONNX Runtime uses multi-threading, enabling better throughput and lower latency on multi-core CPUs. Based on the workload, the CPU cores can automatically use threads within or across requests.

Embedding Models that ship with the Container

  • all-mpnet-base-v2

    An English sentence-transformer model suitable for semantic similarity search and clustering use-cases. This model is larger and slower than all-MiniLM-L12-v2, but more accurate.

    • Dimensions: 768
    • Model size: 0.1 B params
    • File size: 105 MB

  • all-MiniLM-L12-v2

    An English sentence-transformer model suitable for semantic similarity search and clustering use-cases.  This model is smaller and faster than all-mpnet-base-v2, but less accurate.

    • Dimensions: 384
    • Model size: 34 M params
    • File size: 134 MB

  • multilingual-e5-base

    A multilingual sentence-transformer model that supports over 100 languages. Supports text embedding, translation, and multilingual understanding.  This models is smaller and faster than multilingual-e5-large, but less accurate.

    • Dimensions: 768
    • Model size 0.3 B params
    • File Size: 1.2 GB

  • multilingual-e5-large

    A multilingual sentence-transformer model that supports over 100 languages. Supports text embedding, translation, or multilingual understanding. This model is larger and slower than multilingual-e5-base, but more accurate.

    • Dimensions: 1024
    • Model size: 0.6 B params
    • File Size: 2.1 GB

  • clip-vit-base-patch32-txt

    A text encoder that produces embeddings compatible with the CLIP image encoder. Enables text-to-image search and similarity matching between text descriptions and images.

    • Dimensions: 512
    • Model size: 151 M params
    • File Size: 256 MB

  • clip-vit-base-patch32-img

    An image encoder that produces embeddings compatible with the CLIP image encoder. Enables image-to-text search and similarity matching between images and text descriptions.

    • Dimensions: 512
    • Model size: 151 M params
    • File Size: 352 MB

March 24, 2026

Getting Started with Oracle Private AI Services Container

Doug Hood, Product Manager, Oracle

Following our recent announcement that the Oracle Private AI Services Container is now available on Oracle Container Registry, it’s time to move from “what’s new” to “how it works”. In this post we’ll walk through installing, configuring, and using the container in your own environment.