background-shape
Reading the Postgres Query Planner, A Practical EXPLAIN Walkthrough
February 4, 2022 · 6 min read · by Muhammad Amal programming

TL;DREXPLAIN (ANALYZE, BUFFERS) is the most useful diagnostic in Postgres. Read it from the inside out. Watch for sequential scans on big tables, large Rows Removed by Filter, and big gaps between estimated and actual rows. Each pattern has a specific fix.

Tuesday’s perf checklist called out EXPLAIN ANALYZE as the single most useful command for Postgres tuning. Today goes deeper: how to actually read its output, what each line means, and the three patterns that account for most slow queries in production.

I’m going to use a single example query throughout and walk through what the planner produces. By the end you should be able to look at a plan and form a real hypothesis about why a query is slow, instead of guessing at indexes and hoping.

The example schema

CREATE TABLE customers (
  id        bigserial PRIMARY KEY,
  email     text NOT NULL UNIQUE,
  created_at timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE orders (
  id          bigserial PRIMARY KEY,
  customer_id bigint NOT NULL REFERENCES customers(id),
  status      text NOT NULL,
  total_cents bigint NOT NULL,
  created_at  timestamptz NOT NULL DEFAULT now()
);

CREATE INDEX orders_customer_id_idx ON orders (customer_id);

Roughly 5 million orders across 200K customers. The query under inspection:

SELECT id, total_cents, created_at
FROM orders
WHERE customer_id = 42
  AND status = 'paid'
ORDER BY created_at DESC
LIMIT 50;

Running EXPLAIN ANALYZE

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT id, total_cents, created_at
FROM orders
WHERE customer_id = 42
  AND status = 'paid'
ORDER BY created_at DESC
LIMIT 50;

Output (representative):

Limit  (cost=0.43..148.27 rows=50 width=24) (actual time=0.045..2.418 rows=50 loops=1)
  Buffers: shared hit=2400 read=12
  ->  Index Scan Backward using orders_created_at_customer_idx on orders
        (cost=0.43..3208.55 rows=1085 width=24)
        (actual time=0.044..2.412 rows=50 loops=1)
        Index Cond: (customer_id = 42)
        Filter: (status = 'paid'::text)
        Rows Removed by Filter: 178
        Buffers: shared hit=2400 read=12
Planning Time: 0.142 ms
Execution Time: 2.450 ms

Five things to read out of this. Read them in this order.

1. Read from the inside out

The deepest-indented node runs first. In this plan, the index scan runs first, then Limit consumes its output. Always read bottom-up. Top-level Execution Time is the whole query; per-node timings are cumulative to that node.

2. Cost vs actual

(cost=0.43..148.27 rows=50 width=24) is the planner’s estimate. (actual time=0.045..2.418 rows=50) is what really happened. Cost is in arbitrary units — only comparable to other plans, not meaningful in absolute terms. What matters is whether the estimated rows match actual rows.

In this plan, estimated rows = 1085 for the index scan but only 50 were actually used (because of LIMIT). That’s expected. The dangerous mismatch is when the planner thinks “1 row” and reality is “50,000” — then the planner picks a bad strategy based on a wrong assumption.

3. Buffers tells you cache hit rate

Buffers: shared hit=2400 read=12 means 2400 pages were served from shared_buffers and 12 had to be read from disk (or OS cache). On a hot query, you want read=0. If a query consistently shows high read, either the working set doesn’t fit in shared_buffers, or the query is touching cold data.

BUFFERS is opt-in for a reason: it adds info nothing else gives you. Always include it.

4. Filter vs Index Cond

Index Cond: (customer_id = 42) is what the index resolved. Filter: (status = 'paid'::text) is what was applied after the index lookup, by scanning the rows.

Rows Removed by Filter: 178 is the smoking gun. The index found 228 rows for customer 42; the filter then discarded 178 of them because their status wasn’t ‘paid’. We did 4× the I/O we needed to.

The fix: either a partial index, or a composite index that includes status:

CREATE INDEX orders_customer_status_created_idx
  ON orders (customer_id, status, created_at DESC);

Now the index can resolve both conditions and the rows it returns are already in the right order.

5. The three patterns to memorize

Once you’ve stared at a few hundred plans, three patterns recur. Memorize these.

Sequential scan on a big table:

Seq Scan on orders  (cost=0.00..125341.00 rows=4892311 width=24)
  Filter: (customer_id = 42)
  Rows Removed by Filter: 4892261

The planner read every row of orders and threw 99.999% away. Cause: no index on customer_id, or the planner ignored it because statistics are stale (ANALYZE orders). On small tables a seq scan is correct; on a 5M-row table it almost never is.

Big gap between estimated and actual rows:

Index Scan using ... (cost=0.43..8.45 rows=1 width=8) (actual ... rows=42891 loops=1)

Planner thought one row; reality returned 43K. The planner picked a strategy for “1 row” (nested loop join, probably) and is now doing 43K of them. Fix: ANALYZE table_name and bump default_statistics_target for the column. Sometimes you need to express the query differently.

Nested Loop on a big inner side:

Nested Loop  (actual time=0.5..28412.0 rows=1245 loops=1)
  ->  Seq Scan on customers  (rows=200000 loops=1)
  ->  Index Scan on orders  (rows=0.006 loops=200000)

200K outer rows × 0.006 inner = 1245 result rows. Nested loops are right for small outer sides, terrible for big ones. Fix: usually a Hash Join would be better — but the planner won’t pick one if it underestimates the outer side (see pattern 2). Often, fixing planner statistics fixes the plan.

A note on EXPLAIN without ANALYZE

EXPLAIN alone just shows the estimate. It doesn’t run the query. Use it when:

  • The query would take hours and you just want to see the plan.
  • The query has side effects (INSERT/UPDATE/DELETE) and you don’t want to run them. (EXPLAIN ANALYZE does execute, including writes. Wrap in BEGIN; ... ROLLBACK; if you want to dry-run.)
  • You want to check what the planner thinks before you actually execute.

For everything else, EXPLAIN ANALYZE gives you ground truth.

Tools that help

A handful of free tools make plans easier to read:

  • explain.depesz.com — paste plan text, get colored output highlighting expensive nodes. Old but unbeaten.
  • explain.dalibo.com — newer, prettier, more interactive. Same job.
  • pg_stat_statements — for the “top 20 slow queries” view that tells you which queries to EXPLAIN in the first place.

Use them. There’s no virtue in squinting at raw plan text when better tools exist.

Common Pitfalls

Reading plans top-down. Top is the result, bottom is where execution starts. You’ll misdiagnose every plan if you read in the wrong direction.

Ignoring loops=N in nested plans. If a node says actual time=2 ms (loops=10000), that’s 20 seconds, not 2 ms. Always multiply.

Trusting cost as a wall-clock estimate. It isn’t. Cost units are calibrated relative to a “page read” via random_page_cost and friends. Two plans’ cost numbers are comparable to each other; cost-to-real-time isn’t.

Assuming an index “should” be used. The planner will skip an index if it thinks a seq scan is cheaper — and it’s usually right, especially on small tables or high-selectivity queries. If you want to force the issue for testing, set enable_seqscan = off in your session and re-run. Never set it globally.

Running EXPLAIN ANALYZE on UPDATE/DELETE in production by accident. It runs. Wrap in a transaction with ROLLBACK or — better — test on a clone.

Wrapping Up

EXPLAIN ANALYZE is the single highest-leverage tool in Postgres. Read inside-out, check the row count gap, watch for Rows Removed by Filter, and remember that the planner’s strategy is only as good as its statistics. Next post (Monday Feb 7): the four memory settings that decide whether Postgres uses RAM or thrashes — the second item on the checklist, drilled to the bottom.