background-shape
The Vector Database Landscape in 2023, Pinecone, Milvus, Weaviate, and Chroma Compared
April 3, 2023 · 7 min read · by Muhammad Amal ai

TL;DR — Pinecone is the easiest managed option but locks you in and bills per pod. / Milvus and Weaviate are the strongest self-hosted choices, with Weaviate winning on developer ergonomics and Milvus on raw scale. / Chroma is best treated as a notebook-grade tool for prototyping, not a production system.

Six months ago I’d never heard most engineers in my circle say “vector database” without quotes around it. Now half the LLM demos coming out of my team need one, and the procurement questions are starting to land in my inbox. The market is moving fast enough that any comparison written today will be partially wrong by July, but the architectural divides between these four systems are stable enough to reason about.

This post is the writeup I wish someone had handed me when I started evaluating them in February. I’ll cover what each system actually is under the hood, where the rough edges are right now, and how I’d choose between them for a real workload.

I’m assuming you already know what an embedding is and why approximate nearest neighbor (ANN) search matters. If not, the embeddings deep-dive I’m planning for later this month will fill in that gap.

What you’re actually buying

A vector database is, in the most reductive sense, a key-value store where the keys are dense float vectors and the lookup primitive is “give me the k nearest neighbors to this vector under some distance metric.” Everything else - filtering, hybrid search, sharding, replication, persistence - is table stakes that the systems implement to varying degrees of maturity.

The interesting differences live in four places:

  1. Index type and tuning. HNSW, IVF, IVF-PQ, ScaNN, DiskANN. Each has its own recall/latency/memory tradeoff curve.
  2. Filtering model. Pre-filter, post-filter, or filter-during-search. This matters enormously when metadata cardinality is high.
  3. Operational model. Managed SaaS, self-hosted Kubernetes-native, or embedded library.
  4. Data model around the vector. Schema-on-write versus schema-on-read, what kind of metadata you can attach, and how rich the query language is.

The four systems below sit at very different points on each of these axes.

Pinecone

Pinecone is the most widely deployed managed vector database right now, and there’s a reason: the API surface is tiny. You create an index, you upsert vectors, you query. That’s the entire mental model.

import pinecone

pinecone.init(api_key="...", environment="us-west1-gcp")

pinecone.create_index(
    name="docs",
    dimension=1536,
    metric="cosine",
    pods=1,
    pod_type="p1.x1",
)

index = pinecone.Index("docs")
index.upsert(vectors=[
    ("doc-1", [0.1] * 1536, {"source": "blog", "year": 2023}),
])

results = index.query(
    vector=[0.1] * 1536,
    top_k=5,
    filter={"year": {"$eq": 2023}},
    include_metadata=True,
)

The pricing model is pod-based. You provision pods (p1, s1, p2) with fixed capacity, and you pay whether or not you’re using them. A p1.x1 pod handles roughly 1M 768-dim vectors and costs about $70 a month at current list prices. Storage and query throughput both scale with pods, not independently, which is great when your workload is balanced and frustrating when it isn’t. Pinecone has been hinting at a serverless tier for a while, but as of April 2023 it isn’t shipping.

The two things to watch out for: there’s no way to self-host Pinecone for development, so your CI environment either pays for a real index or mocks the SDK; and the filter language is JSON-based and limited compared to what you’d get from a SQL-backed system. The Pinecone docs are good but lean on cookbook examples rather than reference material.

I’ll cover Pinecone setup in much more detail in the Pinecone getting started post later this week.

Milvus

Milvus is the heavyweight of the open-source vector databases. Version 2.2.x is the current series, and it’s a proper distributed system: separate query nodes, data nodes, index nodes, and a coordinator, all backed by etcd for metadata and MinIO or S3 for object storage. You can absolutely run it on a single VM in standalone mode for development, but the architecture is built for fleet operation.

The index support is the broadest of any system here. HNSW, IVF_FLAT, IVF_SQ8, IVF_PQ, ANNOY, and a GPU-accelerated IVF variant if you have the hardware. For workloads above ~100M vectors this is where the conversation has to start.

from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType

connections.connect(host="localhost", port="19530")

fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=768),
    FieldSchema(name="source", dtype=DataType.VARCHAR, max_length=128),
]
schema = CollectionSchema(fields, description="docs")
col = Collection("docs", schema)

col.create_index(
    field_name="embedding",
    index_params={
        "index_type": "HNSW",
        "metric_type": "IP",
        "params": {"M": 16, "efConstruction": 200},
    },
)

The cost of all that flexibility is operational complexity. A real Milvus deployment has six or seven moving parts, and the Helm chart has surprised more than one team I’ve talked to. The Milvus documentation is improving but still has gaps where the right answer is “read the Go source.”

If you’re not on Kubernetes already, Milvus is going to feel like a heavy commitment. If you are, and you need scale, it’s the safest bet in the open-source column.

Weaviate

Weaviate sits in an interesting middle ground. It’s open-source like Milvus, but it’s a single Go binary (plus a vector index, which is HNSW only), and the developer ergonomics are noticeably better than Milvus. It also ships with first-class hybrid search out of the box, combining BM25 keyword scoring with vector similarity, which is something Milvus only got recently and Pinecone still doesn’t have natively.

import weaviate

client = weaviate.Client("http://localhost:8080")

schema = {
    "class": "Doc",
    "vectorizer": "none",
    "properties": [
        {"name": "content", "dataType": ["text"]},
        {"name": "source", "dataType": ["string"]},
    ],
}
client.schema.create_class(schema)

client.batch.configure(batch_size=100)
with client.batch as batch:
    batch.add_data_object(
        {"content": "hello world", "source": "blog"},
        class_name="Doc",
        vector=[0.1] * 768,
    )

The data model is GraphQL-flavored, which is either a feature or a quirk depending on your tastes. Filters compose well with vector queries, and the cross-reference support is genuinely useful when your domain has relationships.

The biggest caveat: Weaviate is HNSW-only. If your dataset doesn’t fit in RAM, you’re in trouble. That’s fine up to tens of millions of vectors on commodity hardware, but the Milvus path opens up at the high end.

Chroma

Chroma is the youngest of the four and deliberately scoped down. Version 0.3.x is what’s shipping in April 2023, and the system is best described as “SQLite for embeddings.” It runs in-process by default, persists to a local directory, and the API is intentionally tiny.

import chromadb

client = chromadb.Client()
collection = client.create_collection("docs")

collection.add(
    documents=["hello world"],
    embeddings=[[0.1] * 1536],
    metadatas=[{"source": "blog"}],
    ids=["doc-1"],
)

results = collection.query(query_embeddings=[[0.1] * 1536], n_results=5)

I love Chroma for prototypes. It’s the right tool for “I want to test a retrieval idea on a notebook before committing to infrastructure.” It’s also tightly integrated with LangChain, so swapping it out later is a small change.

What I would not do, today, is run Chroma in production with a real workload. The persistence story is still rough, the indexing is approximate-only with no tuning knobs, and there’s no built-in clustering. The Chroma team is moving fast and this will likely change, but it’s where things stand right now.

Common Pitfalls

A few things I’ve watched teams trip over in the last quarter:

  • Confusing recall with latency. All four systems hit sub-100ms latency on small indexes. The recall numbers diverge much more than the latency numbers, and recall is what your users feel. Always benchmark with your actual data.
  • Underestimating reindex cost. Adding a new field to your schema, changing dimensions, or switching distance metrics usually means rebuilding the index from scratch. Plan for that operation to take hours on a million-vector index.
  • Pre-filter versus post-filter. Pinecone and Weaviate filter during search, which scales well. Some self-hosted setups effectively post-filter, which means a high-selectivity filter can leave you with zero results from your top-k even when matches exist.
  • Cosine versus dot product versus L2. OpenAI’s text-embedding-ada-002 returns normalized vectors, so cosine and dot product are equivalent and dot product is faster. Most sentence-transformers models are not normalized by default. Pick a metric consistent with your model’s training.

What I’d actually pick

If I had to make the decision today, with no prior commitments:

  • Prototype or single-user demo: Chroma.
  • Production, under 10M vectors, want managed: Pinecone.
  • Production, want self-hosted, value developer experience: Weaviate.
  • Production, over 50M vectors, already on Kubernetes: Milvus.

The honest answer is that all four are good enough for the 80% case, and your team’s existing operational competence matters more than the specific system you pick.

Wrapping Up

The vector database market in April 2023 is in the same place the NoSQL market was in 2010: lots of options, all of them improving fast, and the wrong choice today is rarely fatal. Pick the one that fits your operational reality, write a thin abstraction over it, and move on to the actual hard problem, which is building the embedding pipeline and the application around it.

The next post in this series will walk through Pinecone end-to-end, including the index sizing math that’s surprisingly easy to get wrong.