Oracle Private AI Services Container FAQ

The Oracle Private AI Services Container gives Oracle AI Database customers a private, air-gap-capable, OpenAI-style inference layer that keeps embedding work off the database server while still fitting naturally into Oracle-native workflows. Protect Oracle AI Database performance and keep AI inference inside your security boundary.

Get Started with Private AI Services Container

Frequently Asked Questions

General Questions
- Which services are provided by the container?
  Currently two services are provided and more are planned:
  - Vector Embedding Service
  - Vector Index Service
- How do you communicate with the container?
  For the Vector Embedding Service:
  - You use REST APIs to communicate with the container
  - The OpenAI API (REST API) is used to create vector embeddings and list models
  For the Vector Index Service:
  - The Oracle AI Database uses an HTTP/2 binary protocol to implement the CREATE VECTOR INDEX SQL statement
- How much memory is needed?
- How many CPU cores are needed?
  For the Vector Embedding Service:
  Only one CPU core is needed to make the Private AI Services Container work. Using more than one CPU core can improve performance. Either a single request can be processed in parallel for lower latency, or multiple different requests can be processed in parallel for better throughput. Your desired embedding model, number of concurrent requests and desired latency will determine the number of CPU cores per container and how many containers that you need. You need to benchmark your workload to determine the optimal configuration.
  
  For the Vector Indexing Service:
  You need a minimum of 4 CPU cores. More than 16 CPU cores will tend to give diminishing performance gains.
- How much disk space is needed?
- Are GPUs needed?
- Is the container free to use?
- Which 3rd party licenses does the Private AI Services Container use?
Configuration Questions
- How do you control the log level of the container?
- Which TCP ports does the container use?
- Is the config.json file always needed?
Embedding Model Questions
- Which vector embedding models ship with the container?
- How do I use other embedding models in the container?
  The container can work with any Oracle ONNX Pipeline model. The ONNX Pipeline enables embedding models based on the Transformers or Sentence Transformer Libraries to be converted into the ONNX format. The ONNX Pipeline models can be run using either the Oracle AI Database 26ai, or the Oracle Private AI Services Container. More information about creating pretrained models in ONNX Pipeline format can be found here.
- Which embedding models work with the container?
- Which vector embedding model should I use?
  There is no best vector embedding model. The model that you decide on will depend on your use case and your data.
  Consider the following factors when choosing your embedding model(s): Are you processing English text only? Are you processing text for non-English languages? Are you processing image data? Are you processing video data? Are you processing DNA data? Are you using text to search for images? Are you using images to search for text? Is the quality of the search the most important factor? Is the speed of the search the most important factor? Do you need a good blend of quality and performance? Do you have a corporate standard embedding model that needs to be used?
- Can the container create OpenAI-compatible vectors?
Deployment Questions
- Which container runtimes are supported?
- Can more than one container be used?
  For the Vector Embedding Service:
  The REST requests used to create vector embeddings are stateless, so you can use as many containers as you want. Multiple machines can be used. Also multiple containers can be run on the same host as long as each container listens on a different TCP port. A load balancer can be used to route the REST requests to a container. The Oracle Database Operator for Kubernetes 2.1 and later enables deployments with multiple pods and a load balancer.
  
  For the Vector Index Service:
  You can use more than one container, but you must configure the containers and load balancers yourself. More details are available here.
- How is high availability supported?
- Does the container run on the Oracle AI Database server?
  The container should not run on the same machine as the Oracle AI Database Server. The container should run on a Linux machine that is close to the Oracle AI Database Server. The container is designed to accelerate resource intensive tasks such as creating vectors or vector index creation to machines other than the Oracle AI Database. This enables the Oracle AI Database to have low latency and high throughput without being burdened with these resource intensive AI infrastructure tasks.
- Where can the container be run?
- Can the container run in an air gapped environment?
- What Linux software is required for the container to run on?
- How many concurrent users are supported?
  For the Vector Embedding Service:
  The container enables multiple concurrent users. The effective number of concurrent users will be determined by the number of CPU cores and the embedding model.
  
  For the Vector Indexing Service:
  Multiple concurrent CREATE VECTOR INDEX statements can be processed at the same time. However, the GPU needs sufficient VRAM for this to work. Due to the VRAM constraint, you will likely only create a single vector index at a time within one container.
Security Questions
- Is SSL supported?
- How are the passwords stored on the container?
- How are API Keys used?
- Is user data stored in the container?
- What security does the container use?
  The container uses the following security:
  - TLS 1.3 for SSL
  - PKCS12 keystores for encrypted passwords
  - API Keys for authentication
  - SELinux in enforcing mode is supported
Vector Index Service Questions
- Is a GPU required?
- Will it work on my GeFORCE RTX 3070?
- Will it work on an NVIDIA DGX Spark or Jetsen?
- How much VRAM is needed?
- How much RAM is needed?
- Which version of the NVIDIA driers are needed?
- Which version of the NVIDIA Container Toolkit is needed?
- How do I choose which of my installed GPUs to use?
- Why is the container not starting?
  This means that the NVIDIA drivers or Container Toolkit were not correctly installed.
  Check the following:
  - podman should give valid output
  - nvidia-smi should give valid output
  - nvidia-ctk --debug cdi list should give valid output
- How can I check whether the container is running?
- Will this work with HNSW indexes which are local or have include columns
- Will this work with IVF indexes
- Will this work with RAC, Exadata or sharded databases?
- What parameters do I need in my CREATE VECTOR INDEX statement?
  You need to add the following to the parameters section:
  - OFFLOAD_CREDENTIAL_NAME
  - OFFLOAD_URL
- Why can the database not communicate with the container?
  The following all need to be correctly configured:
  - The Container's firewall needs to have port 8443 open
  - The cloud network security needs to be able to use port 8443 on the container to send and receive data
  - OFFLOAD_CREDENTIAL_NAME needs to be valid
  - OFFLOAD_URL needs to be valid
  - The database credential needs to have the correct API KEY value for the container
  - The network ACE needs to be configured correctly
  - A database wallet must exist
  - The container's digital certificate needs to be a trusted certificate in the Oracle Wallet on the database
  - The database wallet needs to use the correct path

Oracle Private AI Services Container FAQ

General Questions

Configuration Questions

Embedding Model Questions

Deployment Questions

Security Questions

Vector Index Service Questions