Advanced PostgreSQL 17 Tuning at Scale, A Practical Guide
TL;DR — Postgres 17 ships real improvements in vacuum, WAL, and IO observability, but defaults still target a laptop. Tune memory, IO concurrency, and autovacuum to match your hardware. Use
pg_stat_ioandpg_stat_statementsto drive every change you make.
I’ve been running Postgres at scale for the better part of a decade, and every major release follows the same pattern. The release notes are exciting, the social media takes are louder than they should be, and then you put it in production and discover the defaults are still tuned for somebody’s 2010 MacBook. Postgres 17 is no exception. The engine is genuinely better. The out-of-the-box configuration is not.
This guide walks through how I tune Postgres 17 on real production hardware, the kind of box with 32 to 256 GB of RAM, NVMe storage, and a workload that needs to survive Black Friday. I’ll show you the settings that matter, the diagnostic queries I run to validate them, and the gotchas that have personally cost me sleep.
If you want a refresher on the index-side of performance first, I covered that in Choosing the right Postgres index. This post assumes you already know your indexes are reasonable and your queries aren’t doing anything stupid.
1. Sizing Memory Without Lying To Yourself
The single biggest performance lever in Postgres is memory configuration. Get it wrong and nothing else matters. Get it right and most of your other problems disappear.
shared_buffers
The textbook advice is 25% of RAM. That number was correct in 2008. On modern boxes with 64 GB+ of RAM, 25% is conservative. I run 40% on dedicated database servers and as high as 50% if the working set really is large and the OS page cache isn’t doing useful work.
-- For a 128 GB box dedicated to Postgres
ALTER SYSTEM SET shared_buffers = '50GB';
ALTER SYSTEM SET effective_cache_size = '96GB';
SELECT pg_reload_conf();
-- shared_buffers requires restart
The reason you don’t just go to 80% is because Postgres uses the OS page cache as a second layer. Past about 50% you stop getting useful caching gains and start fighting the OS for memory.
work_mem
work_mem is per-operation, not per-query and not per-connection. A single query with three sorts and two hashes can allocate 5 * work_mem. Tune it accordingly.
-- Conservative starting point for OLTP
ALTER SYSTEM SET work_mem = '32MB';
-- Bump for specific analytical sessions
SET work_mem = '256MB';
In Postgres 17, the planner is meaningfully smarter about parallel hash joins, which means work_mem matters even more. I check whether queries are spilling to disk with:
SELECT query, temp_blks_written, calls
FROM pg_stat_statements
WHERE temp_blks_written > 0
ORDER BY temp_blks_written DESC
LIMIT 20;
Any query with non-zero temp_blks_written is asking for more work_mem. Either raise it globally or bump it for that session.
maintenance_work_mem
For VACUUM, CREATE INDEX, and ALTER TABLE this is your friend.
ALTER SYSTEM SET maintenance_work_mem = '2GB';
Postgres 17 made VACUUM use this memory dramatically better thanks to the new TID store. The old maintenance_work_mem = 1GB cap on vacuum dead-tuple tracking is gone. You can now set this to 4 GB or 8 GB on big boxes and vacuums will actually use it.
2. IO Concurrency and pg_stat_io
Postgres 17 ships pg_stat_io, which is the first time you can see what Postgres is actually doing to your storage without resorting to OS-level tools. This changes how you tune storage parameters.
+-----------------+ +------------------+
| Query Executor | -----> | shared_buffers |
+-----------------+ +------------------+
| ^
miss | | hit
v |
+------------------+
| OS Page Cache |
+------------------+
|
v
+------------------+
| Block Device |
+------------------+
Reading pg_stat_io
SELECT backend_type, object, context,
reads, writes, extends,
hits, evictions, reuses
FROM pg_stat_io
WHERE reads > 0 OR writes > 0
ORDER BY reads DESC;
What you’re looking for:
- High
evictionsonshared_buffersmeans your buffer pool is churning. Either raiseshared_buffersor accept the cache miss rate. - High
readsfrombulkreadcontext means sequential scans. Checkeffective_io_concurrency. - High
writesfromvacuumcontext means autovacuum is busy. Probably good news.
effective_io_concurrency
On NVMe this should be high. On spinning disks (you should not still have spinning disks) it should be low.
-- NVMe / modern cloud block storage
ALTER SYSTEM SET effective_io_concurrency = 256;
ALTER SYSTEM SET maintenance_io_concurrency = 256;
Postgres 17 uses these for the new streaming IO infrastructure. Sequential scans, vacuum, and analyze all get faster with higher concurrency because they can submit multiple async IO requests at once.
3. WAL and Checkpoints Under Load
WAL configuration is where production Postgres deployments quietly fall over. The defaults are sized for a small write workload, and once you exceed them you get checkpoint stalls, replication lag, and IO storms.
Sizing WAL
ALTER SYSTEM SET wal_buffers = '64MB';
ALTER SYSTEM SET max_wal_size = '32GB';
ALTER SYSTEM SET min_wal_size = '4GB';
ALTER SYSTEM SET checkpoint_timeout = '15min';
ALTER SYSTEM SET checkpoint_completion_target = 0.9;
max_wal_size is the soft upper bound for WAL between checkpoints. If you’re writing heavily, you want this big enough that checkpoints are triggered by checkpoint_timeout rather than WAL volume. A checkpoint triggered by WAL volume is an IO emergency. A checkpoint triggered by time is a scheduled IO event.
Monitoring checkpoint behavior
SELECT checkpoints_timed, checkpoints_req,
checkpoint_write_time, checkpoint_sync_time,
buffers_checkpoint, buffers_clean, buffers_backend
FROM pg_stat_bgwriter;
If checkpoints_req is more than 5-10% of checkpoints_timed, your max_wal_size is too small. If buffers_backend is high relative to buffers_checkpoint, your background writer can’t keep up and individual backends are flushing dirty buffers, which kills latency.
WAL compression
Postgres 17 supports wal_compression = lz4 and zstd. On most workloads lz4 is essentially free CPU and reduces WAL volume 30-50%.
ALTER SYSTEM SET wal_compression = 'lz4';
This matters for replication bandwidth, archive size, and recovery time. There’s no reason not to enable it on modern CPUs.
4. Autovacuum That Actually Keeps Up
Autovacuum defaults assume a small, low-write table. On a busy production table they’re an embarrassment.
Per-table tuning is mandatory
You cannot tune autovacuum globally and expect it to work. Hot tables need aggressive settings, cold tables need lazy settings. Here’s how I tune a high-write OLTP table:
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.02,
autovacuum_vacuum_threshold = 1000,
autovacuum_analyze_scale_factor = 0.01,
autovacuum_vacuum_cost_limit = 4000,
autovacuum_vacuum_cost_delay = 2,
autovacuum_naptime = 10
);
What this says: vacuum when 2% of rows are dead (plus 1000), analyze when 1% of rows have changed, use a high cost limit so vacuum can actually make progress, and check this table every 10 seconds.
Postgres 17 vacuum improvements
The new TID store means vacuum can track dead tuples in 60-70% less memory than Postgres 16. Combined with the IO concurrency improvements, vacuum on Postgres 17 is 2-3x faster on large tables in my benchmarks.
-- Check vacuum progress on a running vacuum
SELECT pid, datname, relid::regclass, phase,
heap_blks_total, heap_blks_scanned,
heap_blks_vacuumed, dead_tuple_bytes
FROM pg_stat_progress_vacuum;
The dead_tuple_bytes column is new in Postgres 17 and tells you how much memory the TID store is using. If it’s consistently near maintenance_work_mem, raise that setting.
Catching vacuum trouble
SELECT schemaname, relname,
n_live_tup, n_dead_tup,
round(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_pct,
last_autovacuum, autovacuum_count
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY dead_pct DESC NULLS LAST
LIMIT 20;
Any table with dead_pct above 20% and a last_autovacuum older than a day is a problem. Either autovacuum is being canceled (long-running queries do this) or the settings are too lazy.
5. Connection Pooling Is Not Optional
Postgres uses a process per connection. Each backend costs roughly 10 MB of RAM at idle, more under load. At 500 connections you’ve spent 5 GB of RAM just on backends. At 5000 connections you’ve fallen over.
Use PgBouncer. Not optional. I covered the production-grade setup in Connection pooling with PgBouncer.
# pgbouncer.ini essentials for Postgres 17
[databases]
appdb = host=pg-primary port=5432 dbname=appdb
[pgbouncer]
pool_mode = transaction
max_client_conn = 10000
default_pool_size = 50
reserve_pool_size = 10
reserve_pool_timeout = 3
server_lifetime = 3600
server_idle_timeout = 600
Set max_connections on Postgres to something reasonable like 200, then let PgBouncer fan out to thousands of clients. Postgres 17 plays nicely with PgBouncer 1.24, including the new peer_id support if you’re running multiple PgBouncer instances behind a load balancer.
6. Observability That Drives Decisions
Tuning without observability is guessing. Here are the views I keep open during any tuning session.
Top slow queries
SELECT
substring(query, 1, 80) AS query_snippet,
calls,
round(total_exec_time::numeric / 1000, 2) AS total_seconds,
round(mean_exec_time::numeric, 2) AS mean_ms,
round((100.0 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS pct_total
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 15;
Postgres 17’s pg_stat_statements tracks JIT timing separately, which is useful if you’re using JIT compilation. Most OLTP workloads should disable JIT entirely:
ALTER SYSTEM SET jit = off;
JIT helps long-running analytical queries. It hurts every short OLTP query by adding compile overhead.
Lock contention
SELECT pid, usename, state,
wait_event_type, wait_event,
now() - query_start AS duration,
substring(query, 1, 100) AS query
FROM pg_stat_activity
WHERE state != 'idle'
AND wait_event_type IS NOT NULL
ORDER BY duration DESC NULLS LAST;
If you see wait_event_type = Lock repeatedly, you have lock contention. Use pg_locks to find the blocker:
SELECT blocked.pid AS blocked_pid,
blocking.pid AS blocking_pid,
blocked.query AS blocked_query,
blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE blocked.wait_event_type = 'Lock';
The full official reference for these views lives at postgresql.org/docs/17/monitoring-stats.html.
Common Pitfalls
A short list of things that have personally cost me hours.
1. Setting work_mem too high globally
You set work_mem = 1GB because one analytical query needed it, and now under load your 500 connections each allocate 5 GB across multiple operations and the box OOMs. Use session-level overrides instead.
2. Disabling autovacuum
Every junior DBA tries this once. The table fills with dead tuples, queries slow down, and eventually you need a 12-hour VACUUM FULL during peak hours. Never disable autovacuum globally. Tune it instead.
3. Trusting default checkpoint settings
The default max_wal_size = 1GB is fine for a tutorial. In production it triggers a checkpoint every few seconds under any real write load. Always raise this on busy systems.
4. Forgetting that idle transactions hold locks
A web framework that opens a transaction at request start and forgets to close it on an error will pin XIDs forever, blocking vacuum and growing the WAL. Set idle_in_transaction_session_timeout to something like 60 seconds. Postgres 17 makes this default-on at 0 (disabled), which is dangerous.
ALTER SYSTEM SET idle_in_transaction_session_timeout = '60s';
Troubleshooting
Query suddenly slow after upgrade to 17
Almost always a plan regression. Postgres 17 changed several planner heuristics around parallel hash joins and incremental sort. Run ANALYZE on affected tables and check EXPLAIN (ANALYZE, BUFFERS). If the plan looks worse, you can disable specific features per session:
SET enable_incremental_sort = off;
SET max_parallel_workers_per_gather = 0;
Use those to confirm the regression source, then file a planner workaround at the query level rather than disabling features globally.
Replication lag growing under load
Check whether your replica is CPU-bound on the WAL replay process (single-threaded) or IO-bound. Postgres 17 supports parallel apply for logical replication subscriptions, which helps logical replicas. Physical replicas are still single-threaded on apply, so the fix is faster storage on the replica.
Sudden IO storm at midnight
Almost always autovacuum hitting a wraparound vacuum on a large table. Check:
SELECT relname,
age(relfrozenxid) AS xid_age,
pg_size_pretty(pg_total_relation_size(oid)) AS size
FROM pg_class
WHERE relkind = 'r'
ORDER BY age(relfrozenxid) DESC
LIMIT 10;
If xid_age is approaching autovacuum_freeze_max_age (default 200 million), schedule a manual vacuum during a quiet window so the emergency vacuum doesn’t fire during peak hours.
Wrapping Up
Postgres 17 is the best release in years. The vacuum improvements alone justify the upgrade for any team running tables over 100 GB. But the defaults still target the wrong machine. Memory, IO concurrency, WAL sizing, and autovacuum all need explicit tuning to match your hardware and workload.
Use pg_stat_io and pg_stat_statements to drive every decision. Don’t change settings because a blog post (including this one) told you to. Change them because a metric told you to, and measure the result. That’s the difference between tuning and cargo culting.
Next, I’d recommend reading the pg_stat_io deep dive which walks through how to actually interpret those numbers for capacity planning.