What Is Chroma? An Open Source Embedded Database

Aaron Ricadela | Senior Writer | April 15, 2024

AI development teams building similarity search applications that use large language models and unstructured data sets are turning to vector databases designed to quickly compare the characteristics of millions or billions of data points.

This new breed of specialized vector databases from companies such as Chroma, as well as Pinecone, Qdrant, Weaviate, and Zilliz, compare relationships among vector embeddings that represent unstructured content in high-dimensional mathematical space, showing semantic relationships among the data set’s features. Vector databases power AI applications that search image and video content, recommend products or streaming media, find more-relevant information based on users’ intent, and supplement AI chatbot queries with businesses’ proprietary data through retrieval-augmented generation (RAG). Vector databases excel at finding approximate nearest neighbor (ANN) matches in data sets.

What Is ChromaDB?

The Chroma open source database, made by the eponymous San Francisco startup, lets developers build applications including ANN search, image retrieval, RAG, and ecommerce recommenders. It’s known as being a lightweight vector database that developers can run on a laptop for rapid prototyping, as well as in public or private cloud services. Chroma employs the Apache Arrow data format for fast data access.

Development teams can run Chroma in client/server mode on a single node and deploy it using a Docker container or a hosted machine in a public cloud service. They can also run the database in Chroma Cloud, the company’s managed service, with deployments on Amazon Web Services, Google Cloud Platform, and Microsoft Azure. Chroma is available under an Apache 2.0 license, which allows for commercial use.

The developer-friendly database offers embedding models from OpenAI, Google, Cohere, Hugging Face, and others. It has Python and JavaScript integration with LangChain, LlamaIndex, and Braintrust, as well as Python integrations with AI tools, including Streamlit. Chroma makes clients for a variety of languages, including Python, JavaScript, Ruby, Java, Go, C#, Elixir, and Rust.

Why Oracle Excels over Chroma in Vector Databases

Oracle Database 23ai’s AI Vector Search offers a much broader range of high availability and security options than Chroma and other standalone vector databases. Oracle Virtual Private Database and its Oracle Label Security feature let organizations control internal access to data based on users’ role or permission grade. Oracle Real Application Clusters let database instances run on a cluster of servers for fault tolerance and load balancing. Oracle Active Data Guard replicates immediately to a standby database, preventing data loss in an unplanned outage.

Oracle Database 23ai can also optimize vector searches by determining whether additional filters, often stored in relational columns, should be applied before or after the vector search. That means when a similarity search application returns a list of the top-K results, Oracle Database can decide whether to narrow down those results based on relational attributes of the data, either before the vector search gets executed, during execution, or following it.

Find out how AI can gain better access to your data, improving results and simplifying use.

ChromaDB FAQs

What is Chroma and how does it work?

Chroma is a lightweight vector database for building applications powered by similarity searches of vector embedding space. It includes an object storage layer to lower the cost of serving vector indexes for similarity search.

What makes Chroma different from other vector databases?

The Chroma database is aimed at developers working on small projects and can be installed on a laptop for rapid prototyping without big hardware commitments. It also supports a wide range of development languages and AI tools.

Can Chroma handle both structured and unstructured data?

Chroma is designed for storing and searching vector embeddings, not for processing structured, relational data.

Is Chroma open source?

Yes, Chroma is available under the open source Apache 2.0 license, which lets users redistribute its code in their own products.