Postgres Performance Tuning, My 2022 Checklist
TL;DR — 90% of Postgres performance work in 2022 is six things: right
shared_buffers, rightwork_mem, autovacuum that keeps up, a connection pooler, the indexes the query planner actually wants, andEXPLAIN ANALYZEon the queries that matter. Measure withpg_stat_statements. Repeat.
January was the containerization month. February’s theme shifts underneath: Postgres performance tuning, advanced indexing, and CI/CD with GitHub Actions. Why Postgres now? Because the shared Postgres cluster we set up in January is starting to feel the traffic from the new Go services, and “throw a bigger instance at it” stops being clever past a point. Tuning matters.
This first post is the checklist I run through whenever a Postgres cluster starts misbehaving. It’s not exhaustive. It’s the things that, in my experience, account for 90% of the wins, in roughly the order I check them. The rest of February goes deep on the specific items.
The order of operations
When a Postgres cluster is “slow,” the temptation is to start guessing — “must be the indexes,” “must be vacuum,” “must need more RAM.” Resist. The order that finds real problems faster:
- Look at
pg_stat_statementsand find the actual top-N slow queries. - For each, run
EXPLAIN (ANALYZE, BUFFERS)and see what the planner is doing. - Check memory settings against actual instance size.
- Check autovacuum is keeping up with write volume.
- Check connection counts vs
max_connectionsand whether a pooler is in place. - Verify indexes match the workload (and aren’t bloated).
Each step takes ~30 minutes if you’ve done it before. End-to-end you have a real diagnosis in an afternoon.
Memory settings — the four numbers that matter
For a Postgres 14 instance with N GB of RAM dedicated to the database, the four settings I always check:
| Setting | Starting point |
|---|---|
shared_buffers |
25% of RAM |
effective_cache_size |
50–75% of RAM |
work_mem |
RAM / max_connections / 4 |
maintenance_work_mem |
5–10% of RAM (up to 1 GB) |
Concretely, for an 8 GB instance with 100 connections:
shared_buffers = 2GB
effective_cache_size = 6GB
work_mem = 20MB
maintenance_work_mem = 512MB
shared_buffers is the most-misunderstood. It’s not “Postgres’s cache” — the OS page cache does most of the caching. shared_buffers is where Postgres keeps dirty pages and hot data it doesn’t want to round-trip through the kernel. 25% is a safe default; bigger isn’t always better, and on smaller instances 25% may be enough.
work_mem is the sneaky one. It’s per-operation, not per-connection. A query with three sorts can use 3× work_mem. Set it small and most queries spill to disk (Sort Method: external merge Disk:); set it big and a few bad queries OOM your instance. The deeper post on memory tuning is coming Friday Feb 11.
Autovacuum: the silent killer
The number-one Postgres misconfiguration I see in production is “default autovacuum, write-heavy table.” Defaults assume balanced read/write workloads. A write-heavy table fills up with dead tuples faster than autovacuum cleans them, indexes bloat, the planner starts picking sequential scans, and everyone wonders why the database “got slow.”
Quick check:
SELECT
schemaname, relname,
n_live_tup,
n_dead_tup,
round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_pct,
last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY dead_pct DESC NULLS LAST
LIMIT 20;
If any table is north of 20% dead tuples and last_autovacuum is hours-stale, autovacuum is not keeping up. Lower autovacuum_vacuum_scale_factor for that table (per-table setting), or globally if many tables show the same pattern. Deep dive coming Feb 9.
Connection pooling: not optional past 100 connections
Every Postgres connection is a forked process with ~10 MB overhead. 200 connections = 2 GB of memory before any query runs. Add work_mem allocations on top and you’re OOM at lunch.
The fix is a pooler. PgBouncer in transaction-pooling mode is the default. Three services × 30 app-side connections × pooler with default_pool_size = 25 per database = Postgres sees a manageable, bounded number of physical connections regardless of app-side scaling.
# pgbouncer.ini
[databases]
app = host=postgres port=5432 dbname=app
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5
reserve_pool_timeout = 3
Full setup post on Feb 11.
Indexes: not “more is better”
A useful index speeds up reads. A useless index slows down every write, takes up disk, and clutters the planner’s choices. Two queries I run quarterly:
Unused indexes:
SELECT
schemaname, relname, indexrelname,
pg_size_pretty(pg_relation_size(indexrelid)) AS size,
idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE 'pg_%'
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;
Indexes with zero scans across weeks are candidates for dropping (after sanity-checking they’re not for a quarterly-job query). Free disk, free write speed.
Index bloat:
The pgstattuple extension can show real bloat. Cheaper proxy: any index that’s larger than its table is suspicious.
Index type matters too. B-tree is the default and right for most cases. GIN for full-text and JSONB. GiST for geometric / range types. BRIN for huge tables with naturally clustered data (timestamps, sequential IDs). The Feb 14 post goes through each.
EXPLAIN ANALYZE: the single most useful command
If you take one tool away from this checklist, take this one:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = '...' ORDER BY created_at DESC LIMIT 50;
The output tells you the planned cost, the actual time, how many rows it expected vs got, and (with BUFFERS) how much was served from shared buffers vs read from disk. Three patterns to watch for:
Seq Scanon a big table when you expected an index — missing or wrong index, or planner stats are stale.Rows Removed by Filter: <huge>— index found the rows, but the filter discarded most. Add a partial or composite index.- Big gap between
(estimated rows=X)andactual rows=Y— planner statistics are off.ANALYZEthe table, or raisedefault_statistics_target.
Pair this with pg_stat_statements for the top-N slow queries and you have a full feedback loop: find slow query → explain it → fix it → measure → repeat.
Common Pitfalls
Tuning blind. Changing settings without measuring before/after is faith-based engineering. pg_stat_statements extension (cheap, default in most managed Postgres) is the baseline. Snapshot weekly.
Copying random tuning configs from the internet. A 2-CPU 4 GB instance has different needs than a 32-CPU 256 GB one. The pgtune.leopard.in.ua calculator is a reasonable starting point. Treat its output as defaults, not gospel.
Forgetting to ANALYZE after big data changes. Bulk inserts, deletes, or schema changes invalidate planner stats. The planner will then make terrible plans until the next autovacuum cycle. ANALYZE after big jobs.
Ignoring effective_io_concurrency. On SSD/NVMe storage, the default 1 cripples bitmap heap scans. Set it to 200+ for SSD. Free perf you’d otherwise leave behind.
Treating connection pooling as optional. It is not. Past 100 connections, the per-connection overhead alone eats your RAM. Use PgBouncer (or pgcat, or RDS Proxy).
Letting autovacuum stay at defaults on write-heavy tables. Defaults are conservative. Per-table overrides cost nothing and prevent bloat.
Wrapping Up
This post is the index. The rest of February’s posts go deep on each line item: query planner, memory, autovacuum, pooling, the index-type tour, partial indexes, multi-column index ordering. By March 1 you should have a working theory of how to keep a Postgres cluster healthy without paging your DBA at 3 AM. Friday’s post: reading the query planner — what EXPLAIN is actually telling you and why most people misread it.