Logical Replication for Blue Green Postgres Deploys
TL;DR — Postgres 17 made logical replication usable for blue-green by syncing replication slots to standbys and shipping
pg_createsubscriber. The cutover is six steps, the rollback is two, and the failure mode you actually care about is sequence drift.
Blue-green deploys are how I do major Postgres version upgrades, hardware migrations, and any schema change too risky for an in-place rollout. The pattern is the same in every case: stand up a parallel “green” cluster, replicate the live “blue” cluster into it, switch traffic at a coordinated moment, keep blue running as the rollback target for a day or two.
Until recently, doing this with logical replication had two operational rough edges. Slots didn’t sync to standbys, so a failover during cutover meant manually recreating subscriptions. And bootstrapping a subscriber required a separate physical clone or pg_dump-based seed, both of which had race conditions with the WAL stream.
Postgres 17 fixed both. This post is the actual runbook I use, end-to-end, with the commands and the gotchas. It’s the upgrade pattern from 16 to 17, but the same approach works for any version pair logical replication supports (so, anything 10+).
Why logical, not physical
Physical replication ships byte-level WAL. It requires the same major version, the same architecture, the same data directory layout. Blue and green must be identical. That’s safe but rigid.
Logical replication ships row-level changes over a publication/subscription model. Major version differences are fine. Schema differences are tolerable. You can replicate a subset of tables, or filter rows by a WHERE clause on the publication. This is the flexibility blue-green deploys need.
The historical cost was operational complexity. Postgres 17 reduced that cost to “manageable”.
Step 1, publish on blue
-- on blue (16.4)
ALTER SYSTEM SET wal_level = logical;
-- restart required if changing from replica
ALTER SYSTEM SET max_replication_slots = 20;
ALTER SYSTEM SET max_wal_senders = 20;
-- pg_ctl restart
CREATE PUBLICATION blue_pub FOR ALL TABLES;
FOR ALL TABLES is the simplest publication, and the right choice when the schema is identical on both sides. For partial replication or schema reshape, name tables explicitly.
The publication holds metadata only. No data flows until a subscriber connects to a replication slot.
Step 2, snapshot blue to green
# on green host
pg_basebackup -h blue.internal -U replicator -D /var/lib/postgresql/17/data \
-Ft -X stream -P -R
This gives you a physical clone of blue inside green, in 17’s data directory format. Postgres 16 to 17 is binary-compatible enough that this works as a starting point — but we’ll convert it.
Stop the physical standby once it’s caught up, then use pg_createsubscriber:
pg_createsubscriber -d app \
--publisher-server="host=blue.internal port=5432 dbname=app" \
--subscriber-server="host=green.internal port=5432 dbname=app" \
--publication=blue_pub --subscription=green_sub
pg_createsubscriber (new in 17) takes a stopped physical standby and converts it into a logical subscriber, with the WAL position recorded so no rows are dropped or duplicated. This is the step that used to require a custom seeding script.
Step 3, watch replication lag
From blue:
SELECT slot_name, active, restart_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS lag_bytes
FROM pg_replication_slots WHERE slot_name = 'green_sub';
From green:
SELECT subname, received_lsn, latest_end_lsn,
latest_end_time, last_msg_receipt_time
FROM pg_stat_subscription;
Lag should converge to a few MB during steady state. If it grows unboundedly, your subscriber is too slow — usually a missing index on the subscriber side, or a REPLICA IDENTITY set to FULL causing huge WAL records.
I don’t cut over until lag is < 1 MB for at least 30 minutes.
Step 4, sync slots to green’s standbys
If green has its own physical standby (it should), Postgres 17’s slot sync feature keeps the replication slots replicated too. Set on green primary:
ALTER SYSTEM SET sync_replication_slots = on;
ALTER SYSTEM SET standby_slot_names = 'green_standby_1';
SELECT pg_reload_conf();
This means if green’s primary dies mid-cutover, you can promote its standby without losing the subscription state. Before 17, this scenario required manually recreating the subscription and reseeding, which was the most stressful possible time to be doing manual ops.
Step 5, the cutover window
This is the only moment when you accept downtime, and it should be < 60 seconds.
-- on blue: stop writes
ALTER DATABASE app CONNECTION LIMIT 0;
-- terminate non-replication connections
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE datname = 'app' AND backend_type = 'client backend';
-- wait for green to catch up: lag = 0
Confirm on green that pg_stat_subscription.received_lsn = pg_current_wal_lsn() from blue. Then:
-- on green: advance sequences
SELECT setval('orders_id_seq', (SELECT max(id) FROM orders));
SELECT setval('users_id_seq', (SELECT max(id) FROM users));
-- ... for every sequence in the schema
This is the step that bites people. Logical replication does not replicate sequence advances. After cutover, green’s sequences are still at their initial values. Inserts will collide with existing IDs until you reset them.
Flip DNS / load balancer / connection string to point at green. Connections drain from blue, new connections land on green.
Step 6, keep blue read-only for rollback
Don’t drop blue immediately. Make it read-only:
ALTER SYSTEM SET default_transaction_read_only = on;
SELECT pg_reload_conf();
If something goes wrong on green in the first 24 hours, you can flip DNS back to blue. After 24 hours of clean operation on green, decommission blue.
Rollback procedure
If you need to roll back during the cutover window itself (the first hour), it’s simple:
- Flip DNS back to blue.
- Run
ALTER DATABASE app CONNECTION LIMIT -1;on blue.
Blue still has the data through the moment writes stopped. The few writes that hit green between cutover and rollback are lost. Decide ahead of time whether that’s acceptable.
If you need to roll back after green has accepted writes for hours and you want to keep them, you need bidirectional logical replication during the window, which is genuinely complex and not what I’d recommend for a first blue-green cutover. The single-direction-with-a-short-rollback-window pattern is enough for 95% of upgrades.
For the broader topic of how to land schema changes during a window like this, see zero downtime Postgres migrations in 2024.
What logical replication doesn’t do
Two surprises worth knowing about up front:
- Sequences. Discussed above. Reset them explicitly at cutover.
- DDL.
CREATE TABLE,ALTER TABLE, etc. don’t propagate. Run schema changes on both sides during the cutover window, in the same order, or use a tool like Bytebase or pgroll to coordinate. Postgres 18 promises logical DDL replication; 17 doesn’t have it.
A handful of other gotchas:
- Large objects (
pg_largeobject) don’t replicate logically. TRUNCATEdoes replicate by default in 17. If you don’t want that, setpublish = 'insert,update,delete'on the publication.- Replication slots hold WAL on blue until consumed. If green’s subscription falls behind permanently or you forget to drop it, blue’s disk fills up.
The official logical replication chapter is required reading before you do this for real.
Common Pitfalls
A short list of things that have cost me sleep.
- Forgetting
REPLICA IDENTITY. Tables without a primary key needALTER TABLE t REPLICA IDENTITY FULL, which logs the entire row on every update. WAL volume goes through the roof. Always have a primary key or a unique non-null index, and setREPLICA IDENTITYto it. max_replication_slotstoo low. Default is 10. Cutovers use slots transiently, and a failed cleanup leaves orphans. I run 20+ in production.- TOAST tables in logical decoding. Large updated columns with unchanged TOAST values can confuse replication. Postgres 17 improved this but it’s still a source of subtle bugs. Test with realistic data.
- Subscriber missing extensions.
pgvector,postgis, etc. must be installed on green or replication fails when it encounters the types. Install before subscribing. max_wal_sizetoo small on blue. During the catch-up phase, WAL accumulates. Bump it temporarily.- No traffic shaping. If you flip DNS and 100% of writes land on a cold green cluster, the buffer cache is empty and p99 spikes. Warm green by replaying a query mix in shadow mode for an hour beforehand.
The mistake I see most is doing the rehearsal without the actual data volume. A cutover that takes 30 seconds on 10 GB of staging data may take 30 minutes on 10 TB of production WAL backlog. Rehearse on a clone of production size or you have rehearsed nothing.
What’s Next
Blue-green with logical replication is the safest upgrade pattern available for Postgres in 2024, and 17 is the first release where I’d recommend it without caveats for major version upgrades. Older versions still work, but they require more careful failover handling.
The next things I’d want from this pattern are logical DDL replication (coming, probably 18 or 19) and sequence replication (no firm timeline). Until then, the cutover script needs to handle both explicitly.
Run it once on a staging clone, then schedule the production cutover for a quiet window. With 17’s improvements, this is no longer a heroics-required operation.