Tuning Postgres Memory, shared_buffers, work_mem, effective_cache_size

Tuning Postgres Memory, shared_buffers, work_mem, effective_cache_size

February 7, 2022 · 6 min read · by Muhammad Amal programming

TL;DR — shared_buffers = 25% of RAM. effective_cache_size = 50–75% (a hint to the planner, doesn’t allocate). work_mem = per-operation, sized to keep your common queries in memory. maintenance_work_mem = bigger, used for vacuum and index builds. Get these four right and most “slow query” tickets evaporate.

Memory is the second item on the Postgres tuning checklist. It’s also where most teams I’ve worked with leave the biggest wins on the floor. Four settings carry most of the weight. Getting them right is mechanical once you understand what each one actually does.

Let me walk through them in the order I tune them, with real numbers from an 8 GB RDS instance running a Postgres 14 workload similar to ours.

shared_buffers — the dedicated cache

The setting most people think is “Postgres’s cache.” It’s more accurate to call it the write cache: dirty pages live here until they’re flushed to disk, and hot read pages also stick around. The OS page cache caches everything else.

The 25%-of-RAM default came from years of empirical work. It’s a sweet spot for these reasons:

Too small (say 5%) and Postgres constantly bounces pages between shared_buffers and the OS cache, which is wasted CPU.
Too big (say 75%) and you starve the OS of cache, plus Postgres’s internal cache-management cost grows non-linearly past a point.

# postgresql.conf — 8 GB instance
shared_buffers = 2GB

Restart required (it’s a server-level setting).

Verify it’s working:

SELECT
  c.relname,
  pg_size_pretty(count(*) * 8192) AS buffered,
  round(100.0 * count(*) / (SELECT setting FROM pg_settings WHERE name='shared_buffers')::int, 1) AS pct_buffered
FROM pg_buffercache b
JOIN pg_class c ON b.relfilenode = pg_relation_filenode(c.oid)
GROUP BY c.relname
ORDER BY 2 DESC
LIMIT 10;

The pg_buffercache extension shows what’s actually in shared_buffers. Useful for sanity-checking that hot tables are well-represented.

effective_cache_size — a hint, not an allocation

This one trips people up. effective_cache_size does not allocate memory. It tells the planner: “this is roughly how much memory you can assume is available across shared_buffers + OS page cache for caching data.” The planner uses it when deciding between index scan and sequential scan.

If you set this too low, the planner under-estimates cache and prefers seq scans (because it thinks index reads will require disk I/O). If you set it too high, the planner over-estimates and uses index plans that actually thrash.

Default: 4 GB. On any modern server that’s wrong. Set it.

effective_cache_size = 6GB   # 8 GB RAM × 75%

No restart needed; SELECT pg_reload_conf(); is enough.

work_mem — the per-operation budget

This is the one that bites. work_mem is allocated per sort, per hash, per merge, not per query and not per connection. A single query with three sorts and two hash joins can use 5 × work_mem. A connection running ten such queries serially is fine; ten connections running them simultaneously is 10 × 5 × work_mem = 50 × in flight.

The formula I use:

work_mem ≈ (RAM × 0.25) / max_connections / 4

For 8 GB RAM, 100 connections: (8192 × 0.25) / 100 / 4 = ~5 MB. Bump to 8–16 MB if you’ve got headroom; that’s where typical sorts stop spilling to disk.

work_mem = 16MB
max_connections = 100

Check whether yours is too small:

EXPLAIN (ANALYZE, BUFFERS) SELECT ... ORDER BY ...;

Sort Method: external merge  Disk: 14328kB

external merge Disk means the sort spilled to disk because work_mem wasn’t big enough. If you see this on a hot query, bump work_mem for the session (SET LOCAL work_mem = '64MB';) and re-EXPLAIN. If a session-level bump fixes it, consider whether the global setting is too low — but always weigh against the per-operation multiplier.

Per-query overrides via SET LOCAL are underused. For a known-heavy report query, set work_mem high in that transaction and don’t change the global.

maintenance_work_mem — bigger, less often

Used by VACUUM, CREATE INDEX, ALTER TABLE ADD FOREIGN KEY, and similar maintenance ops. Not used by regular query execution. Because it’s only used by one or two background processes at a time, you can afford to make it big.

maintenance_work_mem = 1GB   # for 8 GB RAM

Bigger means VACUUM finishes faster (fewer index passes) and index builds are faster. The cost is only realised during those operations, not on every query.

Putting it together for an 8 GB instance

Final config block:

# postgresql.conf
shared_buffers = 2GB
effective_cache_size = 6GB
work_mem = 16MB
maintenance_work_mem = 1GB
max_connections = 100

# Related but not memory-specific
random_page_cost = 1.1       # SSD storage
effective_io_concurrency = 200  # SSD storage

random_page_cost = 1.1 and effective_io_concurrency = 200 aren’t memory settings but ship in the same conversation: the planner assumes spinning-disk costs without them, which is wrong on SSD.

Apply:

ALTER SYSTEM SET shared_buffers = '2GB';
ALTER SYSTEM SET effective_cache_size = '6GB';
ALTER SYSTEM SET work_mem = '16MB';
ALTER SYSTEM SET maintenance_work_mem = '1GB';
ALTER SYSTEM SET random_page_cost = '1.1';
ALTER SYSTEM SET effective_io_concurrency = '200';
-- shared_buffers needs a restart
SELECT pg_reload_conf();

Then restart for shared_buffers to take effect.

Validating the tuning

Three things to measure before/after:

Cache hit rate — should be >99% for hot data:

SELECT
  sum(blks_hit) * 100.0 / NULLIF(sum(blks_hit) + sum(blks_read), 0) AS cache_hit_pct
FROM pg_stat_database;

Sort spills to disk — should drop:

SELECT sum(temp_bytes) AS bytes_spilled_to_disk
FROM pg_stat_database;

(Reset baseline with pg_stat_reset() before tuning if you want a clean comparison.)

Top slow queries — via pg_stat_statements. Compare mean_exec_time before/after.

Common Pitfalls

Setting work_mem globally to 256 MB because one query needs it. Every connection × every operation now potentially allocates 256 MB. Forty connections doing two sorts each = 20 GB committed. OOM. Use SET LOCAL for the heavy query.

Forgetting shared_buffers needs a restart. pg_reload_conf() won’t apply it. Schedule a maintenance window.

Setting effective_cache_size too low because “we have only 4 GB shared_buffers.” It’s not the size of shared_buffers — it includes OS page cache. On most servers, 50–75% of total RAM is right.

Ignoring NUMA. On big multi-socket machines, NUMA policies can carve memory in ways Postgres doesn’t anticipate. Pin Postgres to a single NUMA node if you’ve got that kind of hardware (you probably don’t).

Tuning memory before knowing your workload. A read-heavy analytics workload wants big work_mem and big effective_cache_size. A high-throughput OLTP workload wants modest work_mem and big shared_buffers. The “right” numbers depend on what queries actually run.

Wrapping Up

Memory tuning is unglamorous. It’s also where most “Postgres is slow” tickets get resolved without writing a single line of code. Four settings, half a day’s work, restart, measure, win. Next: autovacuum tuning — the third item on the checklist, and the one most likely to be silently failing in your production cluster right now.

shared_buffers — the dedicated cache

effective_cache_size — a hint, not an allocation

work_mem — the per-operation budget

maintenance_work_mem — bigger, less often

Putting it together for an 8 GB instance

Validating the tuning

Common Pitfalls

Wrapping Up

Related posts

Postgres Performance Tuning, My 2022 Checklist

Advanced PostgreSQL 17 Tuning at Scale, A Practical Guide

Postgres Bulk Loading, COPY and Prepared Statements

Postgres Multi-Column Indexes, Column Order Is Almost Everything

Postgres Partial Indexes That Make Slow Queries Disappear

Postgres Index Types, B-tree, GIN, GiST, BRIN, When Each One Wins

Postgres Autovacuum, When Defaults Fail and How to Fix Them

Reading the Postgres Query Planner, A Practical EXPLAIN Walkthrough

Let’s Start a Project