Citus for Sharded Multi Tenant Postgres, A Production Walkthrough
TL;DR — Citus 12.1 turns Postgres into a horizontally scalable sharded database, but only if you pick a distribution key that matches your access patterns and colocate related tables correctly. Get that wrong and you’ll be slower than a single-node Postgres with three times the operational burden.
I’ve watched three different teams adopt Citus and have very different outcomes. The team that thought hardest about their distribution key up front had a smooth migration and now runs comfortably at 50 TB across eight shards. The team that picked a random distribution key spent six months migrating data after realizing every query was a cross-shard join. Citus rewards careful design.
Citus 12.1 (the late 2024 release that’s stabilized through mid-2025) is a meaningfully better Citus than even a year ago. The new shard-rebalancing improvements, the better support for foreign keys across distributed tables, and the tighter Postgres 17 integration all matter. But the fundamentals haven’t changed. You still have to think about your data layout.
This walkthrough is for teams considering or starting with Citus. I assume you know Postgres reasonably well and have read my earlier piece on native Postgres partitioning patterns. Citus extends that mental model to multiple machines.
1. The Citus Model
A Citus cluster has one coordinator node and many worker nodes. The coordinator holds metadata, plans queries, and routes them to workers. The workers hold the actual data, split across shards.
+-------------------+
| Coordinator |
| (metadata, plan) |
+-------------------+
|
+----------+--------+--------+----------+
| | | | |
v v v v v
+--------+ +--------+ +--------+ +--------+ +--------+
|Worker 1| |Worker 2| |Worker 3| |Worker 4| |Worker 5|
|Shards: | |Shards: | |Shards: | |Shards: | |Shards: |
|0,5,10..| |1,6,11..| |2,7,12..| |3,8,13..| |4,9,14..|
+--------+ +--------+ +--------+ +--------+ +--------+
When you distribute a table, Citus splits it into 32 shards by default. Each shard is a regular Postgres table on a worker. A query that filters by the distribution key goes to one shard. A query that doesn’t filter by it fans out to all shards.
Three table types matter:
- Distributed tables — Sharded across workers by a distribution key.
- Reference tables — Replicated to all workers (for small lookup tables).
- Local tables — Live only on the coordinator (for non-distributed data).
2. Choosing The Distribution Key
This is the most important decision you’ll make. The distribution key determines which queries are fast (single-shard) and which are slow (cross-shard).
Multi-tenant SaaS pattern
The classic Citus use case is multi-tenant SaaS where every query filters by tenant. Make tenant_id the distribution key for every distributed table:
-- On the coordinator
CREATE EXTENSION citus;
SELECT citus_add_node('worker1', 5432);
SELECT citus_add_node('worker2', 5432);
SELECT citus_add_node('worker3', 5432);
CREATE TABLE companies (
id bigint NOT NULL,
name text NOT NULL,
plan text,
PRIMARY KEY (id)
);
CREATE TABLE users (
id bigint NOT NULL,
company_id bigint NOT NULL,
email text NOT NULL,
PRIMARY KEY (company_id, id)
);
CREATE TABLE projects (
id bigint NOT NULL,
company_id bigint NOT NULL,
name text,
PRIMARY KEY (company_id, id)
);
-- Distribute by company_id, with colocation
SELECT create_distributed_table('companies', 'id');
SELECT create_distributed_table('users', 'company_id', colocate_with => 'companies');
SELECT create_distributed_table('projects', 'company_id', colocate_with => 'companies');
Notice the primary keys include company_id. This is required for distributed tables. The distribution column must be part of the primary key and any unique constraint.
Colocation matters
Colocation means two tables with the same distribution key value live on the same shard. This lets joins happen locally on each worker, in parallel. Without colocation, joins are cross-shard, which is expensive.
-- Fast: colocated join
SELECT u.email, p.name
FROM users u JOIN projects p ON u.company_id = p.company_id
WHERE u.company_id = 42;
-- Slow: not joined on distribution key
SELECT u.email, p.name
FROM users u JOIN projects p ON u.id = p.created_by;
The second query has no company_id join condition, so Citus can’t push it down to a single shard. It needs to ship data between workers. Don’t do this in hot paths.
3. Reference Tables For Lookups
Some tables don’t make sense to distribute. Things like a list of countries, plan types, or feature flags. These should be reference tables, which are replicated to every worker:
CREATE TABLE plans (
id text PRIMARY KEY,
name text,
monthly_price numeric,
features jsonb
);
SELECT create_reference_table('plans');
Now any worker can join against plans locally without cross-shard communication. The cost is that writes go to every worker, so reference tables should be small and slow-changing.
A useful rule of thumb: reference tables should be under 1 GB and changes should be rare (less than once per minute).
In practice, the things that fit this pattern are static dimension data. Country lists, currency codes, plan definitions, feature flag tables, role definitions. Anything that changes per-tenant should be a regular distributed table, not a reference table. If you make a reference table out of something that gets written to frequently, every worker pays the write cost and your cluster slows down for everyone.
4. Distributed Functions And Schemas
Citus 12.1 supports schema-based sharding, which is a different approach to multi-tenancy. Each tenant gets its own Postgres schema, and Citus distributes schemas across workers.
-- Enable schema-based sharding
SET citus.enable_schema_based_sharding TO ON;
-- Each schema becomes a distribution unit
CREATE SCHEMA tenant_acme;
CREATE TABLE tenant_acme.users (id bigint, email text);
CREATE TABLE tenant_acme.projects (id bigint, name text);
-- Citus automatically places all of tenant_acme on one worker
This is the easier model if your tenants have isolated schemas. It’s the model Citus uses for their managed offering on Azure. Pros: simpler queries, easier tenant migration. Cons: you can’t share data between schemas easily.
For most teams I still recommend row-level sharding (the previous section). Schema-based works best when tenants truly are isolated and you have hundreds to thousands of small tenants.
5. Operations And Rebalancing
Adding a worker
When you add a worker, existing shards don’t move automatically. You need to rebalance:
-- Add the new worker
SELECT citus_add_node('worker6', 5432);
-- Plan a rebalance
SELECT citus_rebalance_start();
-- Check progress
SELECT * FROM citus_rebalance_status();
The rebalancer in 12.1 uses logical replication to move shards without blocking writes. This is a huge improvement over older versions. A 100 GB shard moves in 30-60 minutes depending on network and how busy the source shard is.
Removing a worker
You can’t just shut down a worker. Drain it first:
SELECT citus_drain_node('worker3', 5432);
SELECT citus_remove_node('worker3', 5432);
citus_drain_node moves all shards off the node, then citus_remove_node removes it from the cluster metadata.
Backups
Citus backups are per-worker, plus the coordinator metadata. The simplest production approach:
- Use
pg_dumpon the coordinator with--no-table-datato get the metadata. - Use a coordinated
pg_basebackupor filesystem snapshot on every worker. - Restore is the reverse: restore each worker, then restore coordinator metadata.
For point-in-time recovery, set up WAL archiving on every worker. The coordinator records the time of every distributed write, but you need the worker WAL to actually recover.
Monitoring
Citus adds several useful views:
-- Shard distribution
SELECT nodename, count(*) AS shard_count,
pg_size_pretty(sum(shard_size)) AS total_size
FROM citus_shards
GROUP BY nodename
ORDER BY nodename;
-- Slow distributed queries
SELECT query, partition_key, executor, calls,
round(total_time::numeric, 2) AS total_ms
FROM citus_stat_statements
ORDER BY total_time DESC
LIMIT 20;
-- Current queries across the cluster
SELECT * FROM citus_stat_activity
WHERE state != 'idle';
citus_stat_statements is Citus’s distributed version of pg_stat_statements. It tracks query patterns across the whole cluster.
6. Query Patterns That Work
Citus is fastest when queries fit one of three patterns: single-shard, colocated, and reference table joins.
Single-shard queries
-- Fast: routed to a single worker
SELECT * FROM users WHERE company_id = 42 AND id = 123;
The coordinator sees the company_id = 42 filter, computes which shard holds that company’s data, and forwards the query directly to that worker. Latency is roughly the same as single-node Postgres.
Colocated joins
-- Fast: each worker joins its local shards in parallel
SELECT u.email, count(p.id) AS project_count
FROM users u LEFT JOIN projects p USING (company_id)
WHERE u.company_id IN (SELECT id FROM companies WHERE plan = 'pro')
GROUP BY u.email;
Each worker runs the join on its own shards independently. The coordinator just merges results. This scales linearly with shards.
Reference joins
-- Fast: each worker has its own copy of plans
SELECT u.email, p.name AS plan
FROM users u JOIN plans p ON u.plan_id = p.id
WHERE u.company_id = 42;
The reference table plans is local to every worker, so the join is always local.
What’s slow
-- Slow: cross-shard, requires gathering data
SELECT * FROM users WHERE email = 'foo@example.com';
-- Slow: join on non-distribution key
SELECT u.email, c.name
FROM users u JOIN companies c ON u.id = c.contact_user_id;
The first query has to ask every worker. The second has a join on something other than the distribution key, which forces data movement between workers.
If you have queries like these in hot paths, your distribution key is wrong or you need additional secondary structures (like a reference table mapping emails to company IDs).
Common Pitfalls
1. Picking the wrong distribution key
If you pick a distribution key that doesn’t match your queries, you’re stuck. Re-distributing a table requires reading every row and copying it. On a 5 TB table that’s days of work. Spend a week designing your schema before you create your first distributed table.
2. Schema migrations that touch many tables
In single-node Postgres, ALTER TABLE is fast. In Citus, an ALTER TABLE on a distributed table runs on every shard. If you’re doing a complex migration (renaming columns, changing types), it can take a long time. Use pg_dist_partition and Citus migration patterns instead of treating it like single-node.
3. Forgetting that pg_dump doesn’t work directly
You can’t pg_dump a Citus database the way you would a normal Postgres database. The data is on the workers. Use citus_dump or run pg_dump on each worker separately. Citus 12.1 added better pg_dump integration but it’s still not seamless.
4. Cross-shard transactions are slow
A transaction that touches multiple shards uses 2-phase commit, which is meaningfully slower than a single-shard transaction. Avoid them in hot paths. Design so that transactional units fit within a single distribution key.
Troubleshooting
Query is unexpectedly slow
Use EXPLAIN to see what Citus is doing:
EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM users WHERE email = 'foo@example.com';
If you see “Custom Scan (Citus Adaptive)” with task counts equal to your shard count, the query is fanning out. Either add the distribution key to the WHERE clause or accept that this query touches all shards.
Rebalance stuck
SELECT * FROM citus_rebalance_status();
If a rebalance is stuck, usually a logical replication slot got dropped or a worker is unreachable. Check the worker logs. Resume with:
SELECT citus_rebalance_start();
Shard count is high after many tenant deletions
Citus does not automatically merge shards. If you delete a lot of tenants, you may have many small shards. You can manually consolidate using citus_split_shard_by_split_points() in reverse (merging) but this is a rare operation. Usually it’s fine to leave shards uneven.
For the underlying Postgres parallelism concepts, the Postgres parallel query documentation is a useful reference.
Wrapping Up
Citus is a fantastic tool when you have a clear distribution key and a multi-tenant workload that fits the model. It’s a painful tool when you don’t. The choice is made when you decide your distribution key, not when you install the extension. Take time on that decision.
If you’re considering Citus, also evaluate CockroachDB for comparison. They solve overlapping problems with different tradeoffs.