Chroma 0.3, The Local-First Vector Database for Notebook-Scale Prototyping

Chroma 0.3, The Local-First Vector Database for Notebook-Scale Prototyping

April 24, 2023 · 7 min read · by Muhammad Amal ai

TL;DR — Chroma is the right tool for prototypes, notebooks, and small single-user apps. It is not a production database in April 2023, despite what some demos suggest. / Embedded mode runs in-process with zero infrastructure, which is the killer feature. / Persistence is duckdb-backed and works for tens of thousands of vectors. Past that, you’ll hit limits.

I keep Chroma in my toolkit for the same reason I keep SQLite: when the problem is small enough, “no infrastructure” beats “the right infrastructure.” Most of the retrieval ideas I’ve shipped started life as Chroma notebooks, and several stayed that way for weeks before earning a promotion to Pinecone or Weaviate.

This post is a practical walkthrough of Chroma 0.3.x: when to use it, how to use it well, and how to know when you’ve outgrown it. If you’ve already decided you need a real production vector DB, the Weaviate hybrid search post from earlier this week is probably the better starting point. If you’re still in exploration mode, this is the post.

I’ll be clear about the limitations because there’s a fair amount of marketing around Chroma that papers over them. The team is moving fast and most of what I’ll flag will improve, but it’s where things stand right now.

What Chroma is, and isn’t

Chroma is a Python library that gives you a vector store API backed by DuckDB for metadata and a local HNSW index for vectors. By default it runs entirely in your Python process. There’s an optional client-server mode where Chroma runs as a separate process and you connect over HTTP, but the embedded mode is the more interesting one.

The right mental model: think of Chroma as the SQLite of vector databases, not the Pinecone of vector databases. The API is intentionally small. The deployment model is “just import it.” The performance is fine for the scales it’s designed for. None of this maps onto running a multi-tenant production search system.

Installation and first steps

pip install chromadb==0.3.21 sentence-transformers==2.2.2

The simplest possible usage:

import chromadb

client = chromadb.Client()
collection = client.create_collection("docs")

collection.add(
    documents=["the quick brown fox", "a lazy dog sleeps"],
    metadatas=[{"source": "blog"}, {"source": "story"}],
    ids=["doc-1", "doc-2"],
)

results = collection.query(
    query_texts=["fast animal"],
    n_results=2,
)
print(results)

A few things are happening here implicitly. Chroma is using a default embedding function (sentence-transformers all-MiniLM-L6-v2) under the hood because I gave it documents instead of embeddings. It downloaded the model on first use. The collection lives entirely in memory and disappears when the Python process exits.

That’s the friction-free starting point. For real work, you want to control the embedding function and persist the data.

Persistence

Chroma persists to a local directory via DuckDB:

import chromadb
from chromadb.config import Settings

client = chromadb.Client(Settings(
    chroma_db_impl="duckdb+parquet",
    persist_directory="./chroma_store",
))

collection = client.get_or_create_collection("docs")
# ... do work ...
client.persist()

The persist() call writes pending changes to disk. The naming is slightly confusing: data is always tracked in DuckDB, but the in-memory state isn’t durable until you call persist(). In practice I call it at the end of each ingestion run and at process shutdown.

The persistence layer scales fine to ~100K vectors with metadata in my experience. Past that, query latency starts to climb noticeably and the persistence operation itself gets slow.

Custom embedding functions

For real work I want to control the embedding model:

from chromadb.utils import embedding_functions

sbert_fn = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="sentence-transformers/all-mpnet-base-v2",
)

collection = client.get_or_create_collection(
    name="docs",
    embedding_function=sbert_fn,
    metadata={"hnsw:space": "cosine"},
)

There’s also a built-in OpenAI embedding function if you’d rather use text-embedding-ada-002:

openai_fn = embedding_functions.OpenAIEmbeddingFunction(
    api_key="...",
    model_name="text-embedding-ada-002",
)

The metadata={"hnsw:space": "cosine"} parameter sets the distance metric. Options are l2 (default), cosine, and ip (inner product). For text embeddings, cosine or inner product (with normalized vectors) is what you want; the L2 default is rarely correct.

Adding documents

The add method takes documents, metadatas, and IDs. If you also want to provide pre-computed embeddings (bypassing the embedding function), pass embeddings instead of or alongside documents:

collection.add(
    documents=["paragraph one text", "paragraph two text"],
    metadatas=[
        {"doc_id": "doc-1", "position": 0},
        {"doc_id": "doc-1", "position": 1},
    ],
    ids=["doc-1:0", "doc-1:1"],
)

If the embedding function is set, Chroma calls it on the documents when you add them. If you pass embeddings directly, the embedding function isn’t called. I usually pre-compute embeddings in a separate step because it makes the pipeline easier to reason about and lets me batch the OpenAI calls properly.

import openai

def embed_batch(texts):
    response = openai.Embedding.create(
        input=texts,
        model="text-embedding-ada-002",
    )
    return [item["embedding"] for item in response["data"]]

texts = ["paragraph one text", "paragraph two text"]
vectors = embed_batch(texts)

collection.add(
    embeddings=vectors,
    documents=texts,
    metadatas=[
        {"doc_id": "doc-1", "position": 0},
        {"doc_id": "doc-1", "position": 1},
    ],
    ids=["doc-1:0", "doc-1:1"],
)

Chroma in 0.3.x doesn’t have built-in batching for add, so I batch myself at the application layer. Batches of 100 are a reasonable upper bound; past that you can hit memory pressure during the HNSW insert.

Querying

The query API supports text queries (which get embedded), vector queries (when you’ve precomputed), filtering on metadata, and filtering on document content:

results = collection.query(
    query_texts=["how do I authenticate"],
    n_results=5,
    where={"doc_id": "doc-1"},
    where_document={"$contains": "API"},
)

The result is a dict with parallel lists for ids, documents, metadatas, distances, and (optionally) embeddings:

{
    "ids": [["doc-1:0", "doc-1:1"]],
    "documents": [["paragraph one text", "paragraph two text"]],
    "metadatas": [[{"doc_id": "doc-1", "position": 0}, ...]],
    "distances": [[0.21, 0.34]],
}

Note the double-nested lists. Chroma queries support batches of query texts, so the outer list is per-query. If you pass a single query string, you still get a list of length 1 on the outside. This trips up new users.

The filter language is dict-based with MongoDB-style operators: $eq, $ne, $in, $nin, $gt, $gte, $lt, $lte. The where_document filter is full-text on the document body and supports $contains. There’s no regex support.

LangChain integration

Chroma’s tightest integration is with LangChain, which makes sense given the timing of both projects. The integration is a one-liner:

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma

embeddings = OpenAIEmbeddings()
db = Chroma.from_texts(
    texts=texts,
    embedding=embeddings,
    metadatas=metadatas,
    persist_directory="./chroma_store",
)
db.persist()

This is genuinely useful for the “I want to test a retrieval idea in 20 lines of code” use case. The LangChain Chroma wrapper exposes a similarity_search method that returns LangChain Document objects, which compose with the rest of the framework.

I’ll cover LangChain itself in more depth in the LangChain framework intro post coming up later this week.

Client-server mode

For multi-process access or to share a Chroma instance across notebooks, you can run it as a server:

chroma run --path ./chroma_store --host 0.0.0.0 --port 8000

And connect from the client:

import chromadb

client = chromadb.HttpClient(host="localhost", port=8000)
collection = client.get_collection("docs")

This is useful for small team setups where multiple developers want to share an index. I would not run this as a “Chroma in production” pattern; the server doesn’t have authentication, clustering, or replication in 0.3.x.

Common Pitfalls

Forgetting to call persist(). Embedded Chroma keeps changes in memory until you persist. Kill the process before persisting and the data is gone. I’ve lost a few hours of ingestion work to this one.
Default L2 distance for cosine-suited embeddings. Always set metadata={"hnsw:space": "cosine"} (or ip) for text embeddings. The L2 default produces subtly wrong rankings for normalized vectors that look correct in casual testing.
Treating add as idempotent without explicit IDs. Chroma generates IDs if you don’t provide them. Re-running ingestion then creates duplicates. Always provide deterministic IDs.
Reaching for the persisted collection across processes without locking. DuckDB is the backend, and concurrent writers can corrupt the database. Either use the client-server mode or serialize writes at the application layer.
Confusing collection metadata with document metadata. The metadata parameter on create_collection configures the collection itself (distance metric, HNSW params). The metadatas parameter on add is per-document. The names are easy to mix up.

When to graduate

A few signals that you’ve outgrown Chroma:

Your dataset is approaching 100K vectors. Performance starts to degrade noticeably past this point in 0.3.x.
You need multiple writers or readers in different processes. The client-server mode works, but the production story isn’t there.
You need durability guarantees. There’s no replication, no point-in-time recovery, no operational tooling.
You need hybrid search. Chroma doesn’t have BM25 in 0.3.x.
You need fine-grained access control or multi-tenancy. Not in scope for the embedded model.

When any of these become real, the migration path is straightforward: dump the collection, re-embed if you want to change models, and load into the production system of choice.

Wrapping Up

Chroma is a tool I keep installed and rarely deploy, which is exactly the right relationship for a notebook-scale vector store. It’s not the answer for everything, but for the first week of any retrieval project it removes more friction than it adds.

The next and final post in this series ties everything together with LangChain, the framework that’s currently shaping how most teams build LLM applications on top of vector databases.

What Chroma is, and isn’t

Installation and first steps

Persistence

Custom embedding functions

Adding documents

Querying

LangChain integration

Client-server mode

Common Pitfalls

When to graduate

Wrapping Up

Related posts

The Vector Database Landscape in 2023, Pinecone, Milvus, Weaviate, and Chroma Compared

LangChain 0.0.13x, The Framework, the Hype, and the Real Engineering Tradeoffs

Weaviate 1.18 and Hybrid Search, When Keyword and Vector Search Are Both Right

Milvus 2.2 in Production, Self-Hosting the Heavyweight Open-Source Vector Database

Building Semantic Search From Scratch, A Production Walkthrough

Embedding Models in 2023, ada-002, sentence-transformers, and What Actually Matters

Pinecone in Production, Pod Sizing, Upserts, and the Cost Math That Surprises Teams

LLM Vendor Risk, A Failover Playbook After the OpenAI Weekend

Let’s Start a Project