AI Solution

Leveraging Open Neural Network Exchange Models to Vectorize Content in PDFs

Introduction

While querying external large language models (LLMs) for answers to questions will often solve a problem, the requirements for doing so might differ slightly from searching internal corporate knowledge repositories and data sets.

Imagine an organization conducting internal research has several PDFs that should be searched by an AI search engine rather than public LLMs for relevant answers. There’s also the possibility of using traditional relational database management system queries along with generative AI queries to make the search more powerful.

This solution demonstrates how to use Open Neural Network Exchange (ONNX) concepts, create our own ONNX models, and use these models to read PDFs and vectorize content, ultimately developing an Oracle APEX vector-based search engine that can query internal knowledge repositories (and sometimes also query external LLMs).

Demo

Demo: Leveraging Open Neural Network Exchange Models to Vectorize Content in PDFs (1:57)

Prerequisites and setup

  1. Oracle Cloud account—sign-up page
  2. Oracle Database 23ai—documentation
  3. Oracle Machine Learning for Python—documentation
  4. ONNX—documentation
  5. Oracle APEX—documentation