Choosing a Vector Database, Pinecone vs Qdrant vs pgvector
TL;DR — pgvector wins if you’re already on Postgres and under ~10M chunks. Qdrant wins for serious hybrid search and self-hosting. Pinecone serverless wins for low-ops and bursty workloads. Pick by your real constraints, not the leaderboard.
In last week’s piece I waved at the 2024 vector DB landscape. This post is the honest comparison. I’ve shipped production systems on all three of Pinecone, Qdrant, and pgvector in the last 18 months, and the right answer depends on constraints most benchmark posts ignore.
Skip the benchmark wars. Vector DBs all hit the latency targets you actually need. The interesting differences are in how you operate them, how they handle filters, how they cost out, and how they behave when something goes wrong.
The shortlist, briefly
Three options dominate production RAG in early 2024:
- Pinecone — fully managed, the original commercial vector DB. Launched serverless tier January 2024. No self-host option.
- Qdrant — open-source, Rust-based, with a hosted cloud. v1.7 (December 2023) brought multi-vector and improved sparse vector support.
- pgvector — Postgres extension. v0.5 (released August 2023, mature through January 2024) added HNSW indexes, putting it in shouting distance of dedicated vector DBs on read latency.
There are others — Weaviate, Milvus, Chroma, LanceDB, Vespa — but the three above cover roughly 80% of production deployments I’ve seen this year. I’ll keep the scope tight.
Pinecone serverless, where it fits
Pinecone’s serverless tier is the headline 2024 change. The old pod-based model charged per pod per hour, which made small indexes expensive. Serverless decouples storage from query compute and charges per read/write unit. A small index that gets a few thousand queries a day costs single-digit dollars per month.
# Pinecone serverless — January 2024
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key=PINECONE_API_KEY)
pc.create_index(
name="docs",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1"),
)
index = pc.Index("docs")
index.upsert([
("doc-1", [0.1, 0.2, ...], {"team": "engineering", "type": "wiki"}),
])
results = index.query(
vector=query_vec,
top_k=10,
filter={"team": {"$eq": "engineering"}},
include_metadata=True,
)
What you get:
- Effectively zero operations. No clusters to manage, no resize events, no version upgrades.
- Strong metadata filtering with indexed filter fields (declare them on index creation).
- Reasonable hybrid search via sparse vectors (Pinecone added this in 2023; it’s adequate but less expressive than Qdrant).
- Production-grade reliability — multi-AZ, fast snapshots, audit logs.
What you give up:
- No self-hosting. Compliance shops with data residency requirements need to verify Pinecone’s region support.
- Pricing can balloon with high-cardinality filters or large embeddings (3072-dim from
text-embedding-3-largedoubles your write units versus 1536-dim). - You’re locked in. There’s an export path but it’s not trivial.
If your org is fine with managed services and you don’t have aggressive cost ceilings, Pinecone serverless is the easiest day-one choice.
Qdrant v1.7, where it fits
Qdrant is what I reach for when I want power and don’t mind ops. Self-hosting a small Qdrant cluster is genuinely simple — a single Docker container handles single-node, the Helm chart handles clusters.
# Qdrant v1.7 single-node, January 2024
docker run -p 6333:6333 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant:v1.7.4
The standout feature in 2024 is the unified dense-plus-sparse hybrid search with native RRF (mentioned in last week’s piece):
# Qdrant hybrid search via prefetch + fusion
from qdrant_client import QdrantClient, models
client = QdrantClient(url="http://localhost:6333")
client.create_collection(
collection_name="docs",
vectors_config={
"dense": models.VectorParams(size=1536, distance=models.Distance.COSINE),
},
sparse_vectors_config={"sparse": models.SparseVectorParams()},
)
results = client.query_points(
collection_name="docs",
prefetch=[
models.Prefetch(query=dense_vec, using="dense", limit=20),
models.Prefetch(
query=models.SparseVector(indices=sparse_idx, values=sparse_val),
using="sparse",
limit=20,
),
],
query=models.FusionQuery(fusion=models.Fusion.RRF),
limit=5,
query_filter=models.Filter(
must=[models.FieldCondition(key="team", match=models.MatchValue(value="eng"))],
),
)
What you get:
- The most expressive hybrid search of the three. Multi-vector per point, named vectors, the works.
- Excellent filter performance. Indexed payload fields support fast filtered ANN, which most vector DBs struggle with.
- Open-source, self-hostable, with a hosted cloud if you outgrow ops.
- Active development, responsive maintainers.
What you give up:
- You operate it. Single-node is easy; multi-node with replicas wants a real platform engineer.
- The hosted cloud is less mature than Pinecone’s. Region coverage and SLA are improving but not at par yet.
- Newer feature surface means more sharp edges to discover.
pgvector 0.5, where it fits
pgvector got serious in late 2023 with HNSW indexes. The combination of “vectors live next to your relational data” plus “Postgres ergonomics for filtering and joins” is unmatched if your corpus isn’t huge.
-- pgvector 0.5 with HNSW, January 2024
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE doc_chunks (
id BIGSERIAL PRIMARY KEY,
doc_id TEXT NOT NULL,
team TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(1536),
created_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX ON doc_chunks USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
CREATE INDEX ON doc_chunks (team);
-- Query
SELECT id, doc_id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM doc_chunks
WHERE team = $2
ORDER BY embedding <=> $1::vector
LIMIT 10;
What you get:
- One database to operate. Postgres backups, replication, monitoring all just work.
- ACID transactions over vectors and metadata. If your app needs to insert a document and its chunks atomically, you can.
- SQL — joins, window functions, the works. You can do
WHERE-based pre-filtering combined with vector search in expressive ways. - Cost is the cost of running Postgres. No per-vector pricing.
What you give up:
- Scale. pgvector with HNSW handles ~10M vectors per table well. Past that, recall and latency both degrade.
- No native hybrid search. You combine with
tsvectorand your own RRF in SQL — workable, not pretty. - HNSW index build time on large datasets is significant. Plan for hours, not minutes.
For startups and any team already on Postgres, pgvector is the right starting point. Migrate later if you outgrow it.
The actual decision matrix
Skip the benchmarks. The questions that matter:
- Is this a side project or production? Side project — Chroma in-memory is fine. Production — pick from above.
- How many vectors will you have in 12 months? Under 10M and you can stay on pgvector. 10-100M is comfortably any of the three. Over 100M, Pinecone or Qdrant.
- Do you need hybrid search? Yes — Qdrant. Sort of — Pinecone or pgvector with workarounds.
- Do you have ops headcount? Yes — Qdrant or pgvector self-hosted. No — Pinecone serverless or Qdrant Cloud.
- Multi-tenant with strict isolation? All three handle it, but Qdrant’s collections-per-tenant model is probably the cleanest.
- Compliance constraints on data residency? Verify each provider’s region support; pgvector self-hosted is the easy answer here.
I default to pgvector for new projects until proven inadequate. The migration off pgvector when you outgrow it is real but rarely catastrophic; pre-optimizing for scale you don’t yet have costs you 6 months.
The filter pushdown question
The one technical detail that separates serious vector DBs from toys is what they do with filters during ANN search.
Naive approach: do ANN, get top-K, post-filter. If the filter is selective (say 1% of vectors match), top-K of 10 might return zero matches. Bad.
Better approach: integrate the filter into the HNSW graph traversal, so the search only considers matching candidates. This is hard and brittle. Different DBs implement it differently:
- Qdrant has the best filter pushdown I’ve measured. Selectivity down to 0.1% still returns useful results in reasonable time.
- Pinecone indexes specific filter fields you declare upfront. Fast for declared fields, slower for ad-hoc.
- pgvector with HNSW +
WHEREdoes pre-filtering before vector search via the planner. Works well for selective filters; degrades for low-selectivity filters that match millions of rows.
If you have many filter fields with varying selectivity, prototype with your real query patterns before committing.
Common Pitfalls
- Picking by benchmark. All three meet your latency target. Pick by ops fit and feature fit.
- Underestimating filter cardinality. A filter that returns 0.1% of vectors will perform very differently from one returning 50%. Test both.
- Skipping the deletion story. Vector DBs handle deletes differently. Some require periodic rebuilds. Verify before relying on real-time corpus updates.
- Mixing embedding models in one collection. All vectors in a collection must come from the same model. Re-indexing on model swaps is mandatory.
- Ignoring serialization size. A 1536-dim float32 vector is 6KB. A million-vector index is 6GB before metadata. Plan for it.
Wrapping Up
The 2024 vector DB landscape is a buyer’s market — three solid options, each with a clear best-fit profile. pgvector for “I’m already on Postgres,” Qdrant for “I need real hybrid search and can run infra,” Pinecone serverless for “I want zero ops and predictable enough costs.” Pick the one that matches your real constraints and move on.
Next post covers embedding models — text-embedding-3-small vs Cohere v3 vs the open-source contenders. The choice there is more consequential than the DB choice, and gets less attention.
For deeper reading, the pgvector docs and Qdrant docs both have honest performance notes.