Vector Embedding Service

The Private AI Services Container Vector Embedding Service generates vector embeddings.

The container is a REST server which implements the OpenAI API protocol to create vectors for the /v1/embeddings endpoint. The endpoint, API Key and model name determine the type of vectors that will be created.

REST clients include Oracle AI Database 26ai, OpenAI API clients, Postman and curl.

Getting Started with the Embedding Service

Vector Embedding Models

What is a Vector

An embedding is an array of numbers. The numbers are usually floating point (float32 or float64), but they could also be integer (INT8) or binary. The numbers represent the meaning (or semantics) of the unstructured data, whether it is text, images, audio, video or DNA. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.

What is a Vector Embedding Model

Vector embedding models are AI models which convert the unstructed data into a vector. The embedding models are pre-trained neural networks. Most vector embedding models are usually some form of transformer. Sentence Transformers and Vision Transformers are examples.

Models that ship with the Private AI Services Container

Model Name

Description

Must build Pipeline ONNX Model

Model Name

Description

Must build Pipeline ONNX Model

Model Name

Description

Must build Pipeline ONNX Model

Other models that you can use

Model Name

Description

Must build Pipeline ONNX Model

Model Name

Description

Must build Pipeline ONNX Model

Model Name

Description

Must build Pipeline ONNX Model

Clients using vector embedding models in the container

PLSQL REST call

The PLSQL procedure UTL_TO_EMBEDDING from the DBMS_VECTOR package can make a REST call to an endpoint like openai.com, cohere.com, OCI GenAI Services, or the Private AI Services Container.

The URL, credential (API Key) and model determines the vector embedding model that will be used to create a set of vectors.

utl_to_embedding does an HTTPS POST to the /v1/embedding endpoint under the covers.

Embedding with Postman

The Private AI Services Container is a REST server which uses the OpenAI API endpoint /v1/embeddings to create vectors.

Postman can be used to POST to the /v1/embedding endpoint with a model name and message to vectorize.

OpenAI API Python Client

The standard OpenAI API Python client can be used with two changes:

The base_url points to the Private AI Services Container instance
The API Key is for the Private AI Services Container

OpenAI API JavaScript Client

The standard OpenAI API JavaScript client can be used with two changes:

The base_url points to the Private AI Services Container instance
The API Key is for the Private AI Services Container

OpenAI API .NET Client

The standard OpenAI API C# .NET client can be used with some changes:

The base_url points to the Private AI Services Container instance
The API Key is for the Private AI Services Container
The x509 digitial certificate needed for SSL is loaded from disk

OpenAI API Java Client

The standard OpenAI API Java client can be used with two changes:

The base_url points to the Private AI Services Container instance
The API Key is for the Private AI Services Container

Use Cases

ONNX in the Database

Vector Embeddings are created in Oracle AI Database 26ai.

The SQL function vector_embedding is used to create the vectors.

The Private AI Services Container is not needed in this scenario.

There is increased CPU utilization in Oracle AI Database 26ai due to the overhead of creating the data and query vectors.

PLSQL Client

PLSQL sends query or data vector requests to the Private AI Services Container.

The database has a normal workload profile with minimal CPU utilization enabling 10x to 100 more users than CPU cores.

Mid-tier Client

The query vectors can be created in the Private AI Services Container.

Oracle AI Database 26ai does not consume CPU creating query vectors.

The database has a normal workload profile with minimal CPU utilization enabling 10x to 100 more users than CPU cores.