Postgres Partial Indexes That Make Slow Queries Disappear
TL;DR — A partial index includes only the rows matching a WHERE clause. Tiny, fast, cheap to maintain. Use when only a slice of rows is ever queried — pending jobs, soft-deleted excluded, status=‘active’, recent records. The most underused Postgres feature.
After the index-type tour, the next thing to know about indexing is partial indexes — a feature that’s been in Postgres since 7.x but is shockingly absent from most production schemas I see.
A partial index includes only the rows that match a WHERE clause defined when you create it. Smaller, faster to scan, cheaper to maintain. For workloads where the relevant rows are a small fraction of the table, partial indexes are often the single highest-leverage optimization available.
The example: a job queue
Classic case. A jobs table with 50 million historical rows, but the only queries that actually run are “find pending jobs”:
CREATE TABLE jobs (
id bigserial PRIMARY KEY,
payload jsonb NOT NULL,
status text NOT NULL, -- 'pending', 'running', 'done', 'failed'
scheduled_at timestamptz NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
The hot query, run thousands of times per minute by workers polling for work:
SELECT id, payload FROM jobs
WHERE status = 'pending' AND scheduled_at <= now()
ORDER BY scheduled_at
LIMIT 10
FOR UPDATE SKIP LOCKED;
A naive B-tree:
CREATE INDEX jobs_status_scheduled_idx ON jobs (status, scheduled_at);
Works, but indexes all 50 million rows. The index is large. Every worker poll scans the status = 'pending' portion of a giant index.
A partial index:
CREATE INDEX jobs_pending_scheduled_idx
ON jobs (scheduled_at)
WHERE status = 'pending';
Only the ~1000 pending rows are indexed. Index is tiny. Worker polls are essentially instant. Updates to the 49.999M completed/failed rows don’t touch this index at all.
Three orders of magnitude smaller. One order of magnitude faster. Free maintenance win.
The four patterns that recur
Almost every production partial-index opportunity falls into one of these four shapes.
Pattern 1 — “Only the active ones”
CREATE INDEX users_active_email_idx
ON users (email)
WHERE deleted_at IS NULL;
Soft-deleted users dominate the table over time. The index only cares about the live ones.
Pattern 2 — “Only one status”
CREATE INDEX subscriptions_active_renewal_idx
ON subscriptions (renews_at)
WHERE status = 'active';
Subscriptions table grows forever. Only active subscriptions matter for the daily renewal job.
Pattern 3 — “Only recent rows”
CREATE INDEX events_recent_user_idx
ON events (user_id, created_at DESC)
WHERE created_at > '2022-01-01';
A bit different — this is “rolling window” data. The cutoff is static here; for true rolling windows you’d rebuild the index periodically. Still useful when 95% of queries hit recent data.
Pattern 4 — “Only the non-NULL ones”
CREATE INDEX orders_refunded_at_idx
ON orders (refunded_at)
WHERE refunded_at IS NOT NULL;
Most orders aren’t refunded. The few that are need to be quickly findable by refund date.
How the planner uses partial indexes
The planner can use a partial index for a query only if the query’s WHERE is provably implied by the index’s WHERE.
-- Index: WHERE status = 'pending'
CREATE INDEX jobs_pending_idx ON jobs (scheduled_at) WHERE status = 'pending';
-- This query CAN use the index (WHERE matches):
SELECT * FROM jobs WHERE status = 'pending' AND scheduled_at < now();
-- This one CANNOT (WHERE doesn't include status = 'pending'):
SELECT * FROM jobs WHERE scheduled_at < now();
-- This one CANNOT (different predicate):
SELECT * FROM jobs WHERE status IN ('pending', 'running');
The planner is conservative. If it’s not 100% sure your query’s WHERE implies the index’s WHERE, it won’t use the index. Test with EXPLAIN ANALYZE.
Composite partial indexes
You can index multiple columns, partial on one of them:
CREATE INDEX subscriptions_active_customer_renewal_idx
ON subscriptions (customer_id, renews_at)
WHERE status = 'active';
Now queries that filter by customer + renewal time + active status hit a tiny index. We’ll go deeper into multi-column ordering in Friday’s post.
What partial indexes don’t help with
Partial indexes are for queries with selective, stable WHERE conditions. If your queries hit the broad table without those conditions, no partial index helps.
- Don’t use partial indexes when ~all rows match the partial condition — you’ve just made a normal index with extra cognitive overhead.
- Don’t use them with conditions that constantly change (e.g.,
WHERE created_at > now() - interval '7 days'— that’s a moving target; the planner can’t use it). - Don’t use them as a substitute for fixing a fundamentally bad query plan.
Size comparison
For our 50M-row jobs table where 1000 rows are pending at any time:
-- Full index
SELECT pg_size_pretty(pg_relation_size('jobs_status_scheduled_idx'));
-- e.g. 1542 MB
-- Partial index
SELECT pg_size_pretty(pg_relation_size('jobs_pending_scheduled_idx'));
-- e.g. 56 kB
That’s not a typo. Kilobytes vs gigabytes for the same query’s coverage. And every write to a non-pending row touches zero of the partial index.
Migration tactics
If you’ve inherited a schema with bloated full indexes that could be partial, the migration is straightforward:
-- Create the partial first (CONCURRENTLY, no lock)
CREATE INDEX CONCURRENTLY jobs_pending_scheduled_idx_v2
ON jobs (scheduled_at)
WHERE status = 'pending';
-- Verify it's being used
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM jobs WHERE status = 'pending' ORDER BY scheduled_at LIMIT 10;
-- Drop the old (CONCURRENTLY, no lock)
DROP INDEX CONCURRENTLY jobs_status_scheduled_idx;
CONCURRENTLY avoids locking the table during the operation, but takes longer. Worth it in production.
Unique partial indexes
A surprisingly useful variant. Suppose you want emails to be unique among active users only, but allow soft-deleted users to keep their (now-stale) email:
CREATE UNIQUE INDEX users_email_active_unique
ON users (email)
WHERE deleted_at IS NULL;
Soft-delete the user, free up the email, allow reuse. Without partial unique, you’d have to mangle the deleted user’s email or use a more complex scheme.
Common Pitfalls
Predicate that doesn’t match queries exactly. Easy to miss by a token: WHERE status = 'pending' in the index won’t help WHERE status IN ('pending') — actually the planner can see through that one, but WHERE status != 'done' it cannot. Test with EXPLAIN.
Stale stats. Partial indexes need ANALYZE after creation just like full indexes. Otherwise the planner’s selectivity estimate is wrong.
Forgetting that the partial condition applies to the index, not the query. A query without the partial condition will scan the full table or use a different index. That’s fine if it happens once a week. Bad if it happens on a hot path.
Functional partial indexes that the planner can’t see through. WHERE lower(email) LIKE '...' needs an index on lower(email), not just email. Same with partial: predicate must syntactically match what the query uses, in many cases.
Too many overlapping partial indexes. “Index for pending jobs”, “index for running jobs”, “index for done jobs” → you’ve reconstructed the full index in pieces and made maintenance worse. Pick the partial that matches the hot query; don’t proliferate.
Forgetting partial indexes exist. I’ve found 10× wins on schemas other engineers have been tuning for months, just by switching a full index to partial. It’s the most overlooked feature.
Wrapping Up
Partial indexes are tiny, fast, and basically free. If your table has rows that are never queried (soft-deleted, completed, archived), partial indexes are almost certainly the right move. Friday: multi-column index column ordering — the difference between a useful composite index and a useless one is which column you put first.