Weaviate 1.18 and Hybrid Search, When Keyword and Vector Search Are Both Right

April 20, 2023 · 7 min read · by Muhammad Amal ai

TL;DR — Pure vector search misses exact-match queries (product codes, names, technical terms). BM25 alone misses paraphrases. Hybrid search blends both with a single alpha knob. / Weaviate 1.18 ships hybrid search as a first-class query, which is rare among vector DBs in April 2023. / Tune alpha empirically; the right value varies wildly by domain. Start at 0.5 and adjust based on a held-out test set.

The first time I deployed pure vector search for an internal docs site, the feedback was uniformly positive on semantic queries (“how do I authenticate to the API”) and uniformly negative on lookup queries (“ERR_AUTH_401”). The second category just didn’t work, because the embedding model doesn’t know that ERR_AUTH_401 is a specific token that should match exactly.

Hybrid search fixes this. You run vector retrieval and keyword retrieval in parallel, then fuse the scores. Weaviate 1.18 makes this a one-line query, which is genuinely useful and rare in the vector DB space right now. The Milvus production post from earlier this week covers the alternative for teams who need bigger scale; Weaviate is what I’d reach for first when scale isn’t the constraint.

This post walks through Weaviate setup, schema design, hybrid query mechanics, and the alpha-tuning process that actually matters for quality.

Why hybrid

A toy example to make the case concrete. Consider these documents:

A: “Authenticate to the REST API using your API key in the Authorization header.”
B: “When you see error code 401, your API key is invalid or missing.”
C: “Rate limits are enforced per API key.”

And these queries:

Q1: “how do I log in”
Q2: “ERR_AUTH_401”

Pure vector search on Q1 retrieves A, B, C in roughly that order, because all three are semantically related to authentication. Pure BM25 on Q1 retrieves nothing useful, because none of the documents contain “log in.”

Pure vector search on Q2 retrieves the documents in roughly the same order, because the embedding model doesn’t know that “401” is a meaningful exact-match token. Pure BM25 on Q2 retrieves B first, because it’s the only document containing “401.”

Hybrid search retrieves the right document for both queries, with the right one ranked first.

Setting up Weaviate

Weaviate runs as a single Go binary plus its HNSW index. Docker Compose is the easy local path; for production, the official Helm chart works but is lighter weight than Milvus’s.

# docker-compose.yml
version: "3.4"
services:
  weaviate:
    image: semitechnologies/weaviate:1.18.3
    ports:
      - 8080:8080
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: "true"
      PERSISTENCE_DATA_PATH: "/var/lib/weaviate"
      DEFAULT_VECTORIZER_MODULE: "none"
      ENABLE_MODULES: ""
      CLUSTER_HOSTNAME: "node1"
    volumes:
      - weaviate_data:/var/lib/weaviate
volumes:
  weaviate_data:

Setting DEFAULT_VECTORIZER_MODULE: "none" means I’ll provide vectors at upload time rather than letting Weaviate call out to an inference service. Weaviate has built-in modules for OpenAI, Cohere, and Hugging Face, but I prefer to keep embedding out of the database layer. It’s cleaner operationally and lets me change embedding providers without reindexing the database.

Schema and data model

Weaviate’s data model is class-oriented. A class is roughly a table, with properties (columns) and an optional vector per object.

import weaviate

client = weaviate.Client("http://localhost:8080")

doc_class = {
    "class": "Doc",
    "vectorizer": "none",
    "vectorIndexType": "hnsw",
    "vectorIndexConfig": {
        "distance": "cosine",
        "efConstruction": 200,
        "maxConnections": 16,
        "ef": -1,
    },
    "invertedIndexConfig": {
        "bm25": {"b": 0.75, "k1": 1.2},
    },
    "properties": [
        {"name": "title", "dataType": ["text"]},
        {"name": "content", "dataType": ["text"]},
        {"name": "source", "dataType": ["string"]},
        {"name": "url", "dataType": ["string"]},
        {"name": "doc_id", "dataType": ["string"]},
    ],
}

if client.schema.exists("Doc"):
    client.schema.delete_class("Doc")
client.schema.create_class(doc_class)

Two things to note. First, text and string are different types. text is tokenized and BM25-indexed. string is treated as an atomic value (good for IDs, source names, anything you’ll filter on). If you want hybrid search to work on a field, it has to be text.

Second, the BM25 parameters are tunable per class. The defaults (k1=1.2, b=0.75) are reasonable and rarely worth touching unless you’ve benchmarked alternatives.

Ingestion

The batch API is the right way to ingest at scale. Weaviate’s Python client handles the batching cleanly:

client.batch.configure(
    batch_size=100,
    dynamic=True,
    timeout_retries=3,
    callback=lambda r: None,
)

with client.batch as batch:
    for doc in documents:
        batch.add_data_object(
            data_object={
                "title": doc.title,
                "content": doc.content,
                "source": doc.source,
                "url": doc.url,
                "doc_id": doc.id,
            },
            class_name="Doc",
            uuid=deterministic_uuid(doc.id),
            vector=doc.vector,
        )

deterministic_uuid should be a stable hash of the source ID. Weaviate uses UUIDs as primary keys, and stable UUIDs make re-ingestion idempotent.

import uuid

def deterministic_uuid(source_id: str) -> str:
    return str(uuid.uuid5(uuid.NAMESPACE_DNS, source_id))

dynamic=True lets the client adjust batch size based on observed latency, which is the right default for production.

Hybrid query

The hybrid query syntax is GraphQL-flavored:

def search_hybrid(query: str, query_vector: list[float], alpha: float = 0.5, limit: int = 10):
    result = (
        client.query
        .get("Doc", ["title", "content", "source", "url", "doc_id"])
        .with_hybrid(
            query=query,
            vector=query_vector,
            alpha=alpha,
        )
        .with_limit(limit)
        .with_additional(["score", "explainScore"])
        .do()
    )
    return result["data"]["Get"]["Doc"]

alpha is the fusion weight. alpha=0 is pure BM25. alpha=1 is pure vector. The default is 0.5. The actual fusion is reciprocal rank fusion (RRF) under the hood, which is more robust than simple linear score blending because the two score scales are not directly comparable.

with_additional(["score", "explainScore"]) returns the fused score and a per-result explanation, which is invaluable when debugging quality issues.

Tuning alpha

The single most important quality knob in hybrid search is alpha. Tuning it is empirical, but the methodology is the same as embedding model evaluation: build a small (query, expected-doc) test set and measure MRR or recall at each alpha value.

def evaluate_alpha(test_set, alphas=(0.0, 0.25, 0.5, 0.75, 1.0)):
    results = {}
    for alpha in alphas:
        reciprocal_ranks = []
        for query, query_vec, gold_id in test_set:
            hits = search_hybrid(query, query_vec, alpha=alpha, limit=20)
            ids = [h["doc_id"] for h in hits]
            try:
                rank = ids.index(gold_id) + 1
                reciprocal_ranks.append(1.0 / rank)
            except ValueError:
                reciprocal_ranks.append(0.0)
        results[alpha] = sum(reciprocal_ranks) / len(reciprocal_ranks)
    return results

What I’ve seen in practice:

Pure technical docs (API references, error codes, function names) want alpha around 0.3-0.4. Keyword matching dominates.
Conceptual content (tutorials, blog posts, conversational text) wants alpha around 0.6-0.7. Semantic matching dominates.
Mixed corpora often settle around 0.5, and the gains from optimizing aren’t huge.

Don’t pick alpha based on a single anecdote. I have watched teams set alpha to 0.7 because “vector search is the modern way” and ship worse retrieval than they had with pure BM25. Measure.

Filtering

Weaviate’s filter language is the GraphQL where filter, and it composes cleanly with hybrid queries:

def search_hybrid_filtered(query, query_vector, sources):
    where_filter = {
        "path": ["source"],
        "operator": "ContainsAny",
        "valueStringArray": sources,
    }
    result = (
        client.query
        .get("Doc", ["title", "content", "url", "doc_id"])
        .with_hybrid(query=query, vector=query_vector, alpha=0.5)
        .with_where(where_filter)
        .with_limit(10)
        .do()
    )
    return result["data"]["Get"]["Doc"]

Filters are applied before the hybrid scoring, which is what you want. The Weaviate hybrid search docs cover the full operator list.

Sharding and replication

Weaviate 1.18 supports sharding for write scalability and replication for read scalability. Replication is the bigger deal for hybrid search workloads, because the read path can be CPU-intensive.

class_with_replication = {
    "class": "Doc",
    ...
    "replicationConfig": {"factor": 3},
    "shardingConfig": {"desiredCount": 4},
}

replicationConfig.factor=3 means each shard is replicated 3 times. Combined with 4 shards, that’s 12 logical units distributed across your cluster. The cluster needs at least 3 nodes to honor replication factor 3.

Replication in 1.18 is still labeled as a beta feature in some docs. I’ve run it in production without incident, but I’d test failover scenarios before relying on it for SLA-critical workloads.

Common Pitfalls

Using string for searchable fields. I’ve seen this bug catch teams more than once. string doesn’t tokenize, so BM25 can’t match individual words. If users will search the field by keyword, make it text.
Forgetting to provide a vector with the hybrid query. If you don’t pass vector, Weaviate tries to vectorize the query using the configured module. With vectorizer: "none", that fails silently into a pure BM25 query.
Comparing scores across queries. The fused score from hybrid search is not calibrated. A score of 0.8 on one query and 0.4 on another doesn’t mean the first is “more relevant.” Compare within a query, not across.
Ignoring tokenization for non-English text. BM25 in Weaviate uses a default tokenizer that’s English-biased. For CJK languages, configure the right tokenizer per property; otherwise BM25 effectively doesn’t work.
Using hybrid for everything. If your queries are uniformly natural-language paraphrase queries against natural-language documents, pure vector search is often equivalent and cheaper to reason about. Hybrid earns its keep when there’s exact-match in the query distribution.

Wrapping Up

Hybrid search is the right default for any retrieval system that handles mixed query types, and Weaviate’s hybrid implementation is the most usable one in open source right now. The alpha parameter is the knob worth tuning; everything else is reasonable out of the box.

The next post is the lightweight counterpoint to all of this: Chroma for local-first prototyping, where you don’t need any of this infrastructure at all.

Why hybrid

Setting up Weaviate

Schema and data model

Ingestion

Hybrid query

Tuning alpha

Filtering

Sharding and replication

Common Pitfalls

Wrapping Up

Related posts

The Vector Database Landscape in 2023, Pinecone, Milvus, Weaviate, and Chroma Compared

LangChain 0.0.13x, The Framework, the Hype, and the Real Engineering Tradeoffs

Chroma 0.3, The Local-First Vector Database for Notebook-Scale Prototyping

Milvus 2.2 in Production, Self-Hosting the Heavyweight Open-Source Vector Database

Building Semantic Search From Scratch, A Production Walkthrough

Embedding Models in 2023, ada-002, sentence-transformers, and What Actually Matters

Pinecone in Production, Pod Sizing, Upserts, and the Cost Math That Surprises Teams

Hybrid Search, BM25 Plus Vectors for Better RAG Recall

Let’s Start a Project