background-shape
Postgres Logical Replication for CDC
October 7, 2022 · 4 min read · by Muhammad Amal programming

TL;DR — Postgres 14 logical replication: wal_level=logical, create publication, replication slot, consume via pgoutput (built-in) or wal2json. Powers Debezium, custom CDC consumers, multi-region replication. Watch out for unfilled slots — they retain WAL and fill disk.

After CDC vs polling, how Postgres actually exposes change data. Logical replication is the standard since 10.0; 14 adds prepared transactions support and other refinements.

Setup

postgresql.conf:

wal_level = logical
max_replication_slots = 10
max_wal_senders = 10

pg_hba.conf — allow replication connections:

host replication app 10.0.0.0/8 md5

Restart Postgres. From here, you create publications (what data to replicate) and the consumer creates slots (where it left off).

Publications

A publication is “these tables, these operations”:

-- All operations on specific tables
CREATE PUBLICATION my_pub FOR TABLE orders, customers, products;

-- All tables in the database
CREATE PUBLICATION my_pub FOR ALL TABLES;

-- Just INSERTs and UPDATEs (no DELETE):
CREATE PUBLICATION my_pub FOR TABLE orders WITH (publish = 'insert,update');

FOR ALL TABLES is convenient but loud. Prefer explicit lists.

Replication slots

A slot is the consumer’s bookmark. Postgres keeps WAL since the slot’s last read position so consumers can resume after disconnect.

Two ways to create:

-- Via SQL (slot stays until dropped)
SELECT pg_create_logical_replication_slot('my_slot', 'pgoutput');

-- Via the consumer (Debezium etc. create their own)

The plugin name (pgoutput, wal2json, etc.) determines output format. Built-in pgoutput is binary; wal2json emits JSON.

Consuming with pgoutput (binary)

Standard tooling. Used by Debezium, replication subscribers, etc.

A subscription (Postgres → Postgres replication):

-- On the destination DB
CREATE SUBSCRIPTION my_sub
CONNECTION 'host=src-db user=replicator dbname=app password=...'
PUBLICATION my_pub;

Destination DB pulls from source’s publication, applies changes. For DB-to-DB replication this is the simplest path. No external tools.

For CDC into other targets (Kafka, BigQuery), use Debezium (covered Oct 21) which speaks pgoutput.

Consuming with wal2json (JSON output)

Install the plugin:

apt install postgresql-14-wal2json

Then peek at the slot:

SELECT pg_create_logical_replication_slot('etl_slot', 'wal2json');

Query changes:

SELECT * FROM pg_logical_slot_get_changes('etl_slot', NULL, NULL,
  'pretty-print', '1',
  'include-pk', '1',
  'include-types', '1');

Output (one row per transaction):

{
  "change": [
    {
      "kind": "update",
      "schema": "public",
      "table": "orders",
      "columnnames": ["id", "status", "updated_at"],
      "columntypes": ["bigint", "text", "timestamp"],
      "columnvalues": [42, "shipped", "2022-10-07T..."],
      "oldkeys": {"keynames": ["id"], "keyvalues": [42]}
    }
  ]
}

Easy to parse in any language. Less mature than Debezium for production; fine for custom small consumers.

Custom consumer in Go

import "github.com/jackc/pglogrepl"

func consume(ctx context.Context, conn *pgconn.PgConn) error {
    err := pglogrepl.StartReplication(ctx, conn, "etl_slot", lsn,
        pglogrepl.StartReplicationOptions{
            PluginArgs: []string{"proto_version '1'", "publication_names 'my_pub'"},
        })
    if err != nil { return err }

    for {
        msg, err := conn.ReceiveMessage(ctx)
        if err != nil { return err }
        switch m := msg.(type) {
        case *pgproto3.CopyData:
            if m.Data[0] == pglogrepl.XLogDataByteID {
                xld, _ := pglogrepl.ParseXLogData(m.Data[1:])
                processWAL(xld.WALData)
            }
        }
    }
}

For Debezium / Kafka Connect users this is overkill. For projects where you want exactly one consumer pulling Postgres changes into your custom destination: 300 lines.

Replication slot gotchas

The biggest operational risk: slot grows infinitely if consumer stops.

Postgres keeps WAL since the oldest slot’s position. If a consumer dies and nobody notices, WAL accumulates. Eventually fills disk; Postgres stops accepting writes; full outage.

Monitor slot lag:

SELECT slot_name, 
       pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) AS lag_bytes
FROM pg_replication_slots;

Alert if lag > some threshold (e.g., 10 GB). Catches stalled consumers before disk fills.

If a slot is abandoned (consumer permanently gone), drop it:

SELECT pg_drop_replication_slot('orphan_slot');

Initial snapshot

CDC starts from “now.” For tables with existing data, you need a snapshot first.

Process:

  1. Pause source writes (or accept brief inconsistency)
  2. Take snapshot via standard SELECT * → load to destination
  3. Note current LSN: SELECT pg_current_wal_lsn()
  4. Start CDC consumer from that LSN

Debezium handles this automatically. Custom consumers must implement it.

For low-volume tables, simpler: pause writes briefly, dump+load, resume, start CDC. For high-volume, online snapshot is necessary.

Common Pitfalls

Unattended slot growing. Alert on slot lag. The most common Postgres outage cause from CDC.

wal_level=replica instead of logical. Doesn’t support logical replication. Must restart Postgres after changing.

Replication user without REPLICATION role. Auth fails with cryptic error. ALTER USER replicator REPLICATION;

Forgetting pg_hba.conf entry for replication. Connection refused, replication-specific. Add host replication ... line.

Publication for ALL TABLES then schema changes. New tables auto-replicate; existing consumers may not handle them. Be explicit.

Logical replication of large tables without snapshot strategy. Replication only catches changes from start point. Need separate snapshot to backfill.

Wrapping Up

Postgres logical replication = configure wal_level, create publication, consumer creates slot, stream changes. Foundation for CDC. Monday: Python ETL pipeline patterns.