Weaviate 1.18 and Hybrid Search, When Keyword and Vector Search Are Both Right
TL;DR — Pure vector search misses exact-match queries (product codes, names, technical terms). BM25 alone misses paraphrases. Hybrid search blends both with a single alpha knob. / Weaviate 1.18 ships hybrid search as a first-class query, which is rare among vector DBs in April 2023. / Tune alpha empirically; the right value varies wildly by domain. Start at 0.5 and adjust based on a held-out test set.
The first time I deployed pure vector search for an internal docs site, the feedback was uniformly positive on semantic queries (“how do I authenticate to the API”) and uniformly negative on lookup queries (“ERR_AUTH_401”). The second category just didn’t work, because the embedding model doesn’t know that ERR_AUTH_401 is a specific token that should match exactly.
Hybrid search fixes this. You run vector retrieval and keyword retrieval in parallel, then fuse the scores. Weaviate 1.18 makes this a one-line query, which is genuinely useful and rare in the vector DB space right now. The Milvus production post from earlier this week covers the alternative for teams who need bigger scale; Weaviate is what I’d reach for first when scale isn’t the constraint.
This post walks through Weaviate setup, schema design, hybrid query mechanics, and the alpha-tuning process that actually matters for quality.
Why hybrid
A toy example to make the case concrete. Consider these documents:
- A: “Authenticate to the REST API using your API key in the Authorization header.”
- B: “When you see error code 401, your API key is invalid or missing.”
- C: “Rate limits are enforced per API key.”
And these queries:
- Q1: “how do I log in”
- Q2: “ERR_AUTH_401”
Pure vector search on Q1 retrieves A, B, C in roughly that order, because all three are semantically related to authentication. Pure BM25 on Q1 retrieves nothing useful, because none of the documents contain “log in.”
Pure vector search on Q2 retrieves the documents in roughly the same order, because the embedding model doesn’t know that “401” is a meaningful exact-match token. Pure BM25 on Q2 retrieves B first, because it’s the only document containing “401.”
Hybrid search retrieves the right document for both queries, with the right one ranked first.
Setting up Weaviate
Weaviate runs as a single Go binary plus its HNSW index. Docker Compose is the easy local path; for production, the official Helm chart works but is lighter weight than Milvus’s.
# docker-compose.yml
version: "3.4"
services:
weaviate:
image: semitechnologies/weaviate:1.18.3
ports:
- 8080:8080
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
PERSISTENCE_DATA_PATH: "/var/lib/weaviate"
DEFAULT_VECTORIZER_MODULE: "none"
ENABLE_MODULES: ""
CLUSTER_HOSTNAME: "node1"
volumes:
- weaviate_data:/var/lib/weaviate
volumes:
weaviate_data:
Setting DEFAULT_VECTORIZER_MODULE: "none" means I’ll provide vectors at upload time rather than letting Weaviate call out to an inference service. Weaviate has built-in modules for OpenAI, Cohere, and Hugging Face, but I prefer to keep embedding out of the database layer. It’s cleaner operationally and lets me change embedding providers without reindexing the database.
Schema and data model
Weaviate’s data model is class-oriented. A class is roughly a table, with properties (columns) and an optional vector per object.
import weaviate
client = weaviate.Client("http://localhost:8080")
doc_class = {
"class": "Doc",
"vectorizer": "none",
"vectorIndexType": "hnsw",
"vectorIndexConfig": {
"distance": "cosine",
"efConstruction": 200,
"maxConnections": 16,
"ef": -1,
},
"invertedIndexConfig": {
"bm25": {"b": 0.75, "k1": 1.2},
},
"properties": [
{"name": "title", "dataType": ["text"]},
{"name": "content", "dataType": ["text"]},
{"name": "source", "dataType": ["string"]},
{"name": "url", "dataType": ["string"]},
{"name": "doc_id", "dataType": ["string"]},
],
}
if client.schema.exists("Doc"):
client.schema.delete_class("Doc")
client.schema.create_class(doc_class)
Two things to note. First, text and string are different types. text is tokenized and BM25-indexed. string is treated as an atomic value (good for IDs, source names, anything you’ll filter on). If you want hybrid search to work on a field, it has to be text.
Second, the BM25 parameters are tunable per class. The defaults (k1=1.2, b=0.75) are reasonable and rarely worth touching unless you’ve benchmarked alternatives.
Ingestion
The batch API is the right way to ingest at scale. Weaviate’s Python client handles the batching cleanly:
client.batch.configure(
batch_size=100,
dynamic=True,
timeout_retries=3,
callback=lambda r: None,
)
with client.batch as batch:
for doc in documents:
batch.add_data_object(
data_object={
"title": doc.title,
"content": doc.content,
"source": doc.source,
"url": doc.url,
"doc_id": doc.id,
},
class_name="Doc",
uuid=deterministic_uuid(doc.id),
vector=doc.vector,
)
deterministic_uuid should be a stable hash of the source ID. Weaviate uses UUIDs as primary keys, and stable UUIDs make re-ingestion idempotent.
import uuid
def deterministic_uuid(source_id: str) -> str:
return str(uuid.uuid5(uuid.NAMESPACE_DNS, source_id))
dynamic=True lets the client adjust batch size based on observed latency, which is the right default for production.
Hybrid query
The hybrid query syntax is GraphQL-flavored:
def search_hybrid(query: str, query_vector: list[float], alpha: float = 0.5, limit: int = 10):
result = (
client.query
.get("Doc", ["title", "content", "source", "url", "doc_id"])
.with_hybrid(
query=query,
vector=query_vector,
alpha=alpha,
)
.with_limit(limit)
.with_additional(["score", "explainScore"])
.do()
)
return result["data"]["Get"]["Doc"]
alpha is the fusion weight. alpha=0 is pure BM25. alpha=1 is pure vector. The default is 0.5. The actual fusion is reciprocal rank fusion (RRF) under the hood, which is more robust than simple linear score blending because the two score scales are not directly comparable.
with_additional(["score", "explainScore"]) returns the fused score and a per-result explanation, which is invaluable when debugging quality issues.
Tuning alpha
The single most important quality knob in hybrid search is alpha. Tuning it is empirical, but the methodology is the same as embedding model evaluation: build a small (query, expected-doc) test set and measure MRR or recall at each alpha value.
def evaluate_alpha(test_set, alphas=(0.0, 0.25, 0.5, 0.75, 1.0)):
results = {}
for alpha in alphas:
reciprocal_ranks = []
for query, query_vec, gold_id in test_set:
hits = search_hybrid(query, query_vec, alpha=alpha, limit=20)
ids = [h["doc_id"] for h in hits]
try:
rank = ids.index(gold_id) + 1
reciprocal_ranks.append(1.0 / rank)
except ValueError:
reciprocal_ranks.append(0.0)
results[alpha] = sum(reciprocal_ranks) / len(reciprocal_ranks)
return results
What I’ve seen in practice:
- Pure technical docs (API references, error codes, function names) want
alphaaround 0.3-0.4. Keyword matching dominates. - Conceptual content (tutorials, blog posts, conversational text) wants
alphaaround 0.6-0.7. Semantic matching dominates. - Mixed corpora often settle around 0.5, and the gains from optimizing aren’t huge.
Don’t pick alpha based on a single anecdote. I have watched teams set alpha to 0.7 because “vector search is the modern way” and ship worse retrieval than they had with pure BM25. Measure.
Filtering
Weaviate’s filter language is the GraphQL where filter, and it composes cleanly with hybrid queries:
def search_hybrid_filtered(query, query_vector, sources):
where_filter = {
"path": ["source"],
"operator": "ContainsAny",
"valueStringArray": sources,
}
result = (
client.query
.get("Doc", ["title", "content", "url", "doc_id"])
.with_hybrid(query=query, vector=query_vector, alpha=0.5)
.with_where(where_filter)
.with_limit(10)
.do()
)
return result["data"]["Get"]["Doc"]
Filters are applied before the hybrid scoring, which is what you want. The Weaviate hybrid search docs cover the full operator list.
Sharding and replication
Weaviate 1.18 supports sharding for write scalability and replication for read scalability. Replication is the bigger deal for hybrid search workloads, because the read path can be CPU-intensive.
class_with_replication = {
"class": "Doc",
...
"replicationConfig": {"factor": 3},
"shardingConfig": {"desiredCount": 4},
}
replicationConfig.factor=3 means each shard is replicated 3 times. Combined with 4 shards, that’s 12 logical units distributed across your cluster. The cluster needs at least 3 nodes to honor replication factor 3.
Replication in 1.18 is still labeled as a beta feature in some docs. I’ve run it in production without incident, but I’d test failover scenarios before relying on it for SLA-critical workloads.
Common Pitfalls
- Using
stringfor searchable fields. I’ve seen this bug catch teams more than once.stringdoesn’t tokenize, so BM25 can’t match individual words. If users will search the field by keyword, make ittext. - Forgetting to provide a vector with the hybrid query. If you don’t pass
vector, Weaviate tries to vectorize the query using the configured module. Withvectorizer: "none", that fails silently into a pure BM25 query. - Comparing scores across queries. The fused score from hybrid search is not calibrated. A score of 0.8 on one query and 0.4 on another doesn’t mean the first is “more relevant.” Compare within a query, not across.
- Ignoring tokenization for non-English text. BM25 in Weaviate uses a default tokenizer that’s English-biased. For CJK languages, configure the right tokenizer per property; otherwise BM25 effectively doesn’t work.
- Using hybrid for everything. If your queries are uniformly natural-language paraphrase queries against natural-language documents, pure vector search is often equivalent and cheaper to reason about. Hybrid earns its keep when there’s exact-match in the query distribution.
Wrapping Up
Hybrid search is the right default for any retrieval system that handles mixed query types, and Weaviate’s hybrid implementation is the most usable one in open source right now. The alpha parameter is the knob worth tuning; everything else is reasonable out of the box.
The next post is the lightweight counterpoint to all of this: Chroma for local-first prototyping, where you don’t need any of this infrastructure at all.