Vector Index Service

The Private AI Services Container Vector Index Service enables In-Memory Neighbor Graph Vector Indexes to be created faster by using GPU offload. The SQL syntax for CREATE VECTOR INDEX can be used to define the REST endpoint and credential for the Private AI Services Container. Most modern NVIDIA GPUs are supported on the Private AI Services Container for vector index offload.

Getting Started with the Vector Index Service

Vector Index Creation Offload to GPU

The Processing Challenge

Creating vector indexes is resource intensive and time consuming. The algorithm to create an In-Memory Neighbor Graph Vector Index (eg HNSW, or Hierarchical Navigable Small World) requires a lot of vector distance calculations.

With CPU-based systems, memory accesses for the In-Memory Neighbor Graph can become a bottleneck

The larger the vector index, the more vector distance calculations that are needed. The more vector distance calculations, the longer that it takes.

Even though an In-Memory Neighbor Graph Vector Index can be created in parallel using many CPU cores, it can still take hours to create a large HNSW vector index.

GPU Offload

It is possible to offload some of the processing for creating an In-Memory Neighbor Graph Vector Index to a GPU. The high memory bandwidth and parallel processing of a GPU can enable impressive speedups.

A GPU can be used to create a graph from a set of vectors. The resulting graph can then be converted into an Oracle In-Memory Neighbor Graph Vector Index.

This can all be done with a CREATE VECTOR INDEX statement on Oracle AI Database 26ai, and deployment of a Private AI Services Container on a GPU instance.

Image courtesy of NVIDIA.

HNSW Vector Index GPU Offload

The GPU offload process is described below:

CREATE VECTOR INDEX for HNSW on Oracle AI Database
Vectors sent to Compute Offload Server in container
Graph created from vectors
Graph returned to Compute Offload Server
Graph sent back to Oracle AI Database
HNSW Vector Index available for use

Vector Index Creation Performance

Creating an In-Memory Neighbor Graph Vector Index is computationally expensive and time consuming.

Using a CPU algorithm without SIMD techniques is the slowest option for creating In-Memory Neighbor Graph Vector Indexes.

Using SIMD (Single Instruction Multiple Data) techniques on x86-64 CPUs with avx512 can enable up to a 10x speedup for In-Memory Neighbor Graph Vector Index creation.

Using a GPU with the NVIDIA cuVS library using high bandwidth VRAM with Tensor and CUDA cores can enable up to a 10x speedup over SIMD for In-Memory Neighbor Graph Vector Index creation.

Supported NVIDIA GPUs

RTX Pro 4000 Blackwell

This GPU uses uses the Blackwell architecture.

It has:

16 GB VRAM
288 GB/sec memory bandwidth

Image courtesy of NVIDIA.

RTX Pro 4000 Blackwell

This GPU uses uses the Blackwell architecture.

It has:

24 GB VRAM
672 GB/sec memory bandwidth

Image courtesy of NVIDIA.

RTX Pro 6000 Workstation Edition

This GPU uses the Blackwell architecture.

It has:

96GB VRAM
1.8 TB/sec memory bandwidth

Image courtesy of NVIDIA.

NVIDIA A10 GPU

The A10 uses the Ampere architecture.

The A10 has:

24 GB of VRAM
600 GB/sec memory bandwidth

OCI compute shapes:

VM.GPU.A10.1
VM.GPU.A10.2
BM.GPU.A10.4

Image courtesy of NVIDIA

NVIDIA L40s GPU

The L40s uses the Ada Lovelace architecture.

The L40s has:

48 GB of VRAM
864 GB/sec memory bandwidth

OCI compute shapes:

BM.GPU.L40S.4
Oracle Cloud Compute @ Customer

Image courtesy of NVIDIA.

A100 Tensor Core GPU

The A100 uses the Ampere architecture.

The A100 has:

80 GB VRAM
2039 GB/sec memory bandwidth

OCI compute shapes:

BM.GPU.A100
BM.GPU.A100-v2.8
BM.GPU4
BM.GPU4.8

Image courtesy of NVIDIA.

H100 Tensor Core GPU

The H100 uses the Hopper architecture.

The H100 has

94 GB VRAM
3.9 TB/sec memory bandwidth

OCI compute shapes:

BM.GPU.H100.8

Image courtesy of NVIDIA.

NVIDIA H200 GPU

The H200 uses the Hopper architecture.

The H200 has

141 GB VRAM
4.8 TB/sec memory bandwidth

OCI compute shapes:

BM.GPU.H200.8

Image courtesy of NVIDIA.

NVIDIA Blackwell GPU

The B200 uses the Blackwell architecture.

Each B200 has:

180 GB VRAM
8TB/sec memory bandwidth

OCI compute shapes:

BM.GPU.B200.8

Image courtesy of NVIDIA.

Using Vector Index Offload

CREATE VECTOR INDEX example

The existing Oracle AI Database 26ai syntax to create an In-Memory Neighbor Graph Vector Index is extended to include the credential and URL for the Oracle Private AI Services Container.

The offload_credential_name is the API KEY for the Private AI Services Container.

All communication with the Private AI Service Container uses SSL (TLS 1.3).

Container Configuration

The Vector Index Service of the Private AI Services Container can be deployed with Podman as the container runtime.

The x86-64 Linux host machine that the container requires:

Oracle Linux 8.10+, 9 or 10
A supported NVIDIA GPU
NVIDIA drivers for 580.65.06 or later
NVIDIA Container Tool Kit 1.19.0 or later