background-shape
Distributed SQL with CockroachDB 24.3, A Comprehensive Tutorial
June 9, 2025 · 9 min read · by Muhammad Amal programming

TL;DR — CockroachDB 24.3 gives you Postgres-compatible SQL with multi-region resilience and horizontal scaling, but it isn’t drop-in Postgres. Master locality, transactions, and schema patterns before you put it under load.

CockroachDB occupies a strange spot in the database market. It speaks the Postgres wire protocol, runs most Postgres SQL, and looks like Postgres from the outside. But internally it’s a completely different beast. Range-based storage, Raft consensus, distributed transactions, and a query planner that has to think about where data physically lives. Treating it like Postgres works for a demo. It will hurt you in production.

I’ve shipped CockroachDB into a handful of multi-region applications now, and 24.3 (the December 2024 release that’s stabilized through mid-2025) is the first version I’d recommend to teams without a dedicated infra engineer. The vector search support is real, the cost-based optimizer is genuinely good, and the multi-tenant features finally make sense.

This guide walks through how to think about CockroachDB, how to set up a cluster, and how to design schemas that don’t fall over. If you’re new to distributed SQL, you might want to read my older post on logical replication for blue-green deploys first to understand what CockroachDB is replacing.

1. The Mental Model

Every row in CockroachDB lives in a range, which is a contiguous chunk of the keyspace. Ranges default to 512 MB. When a range gets too big, it splits. When a range gets too small, it merges with a neighbor.

+--------------------------------------------------+
|                  Keyspace                        |
|                                                  |
|  [users_1-100]  [users_101-500]  [users_501-...] |
|       |              |                |          |
|     Range 1        Range 2          Range 3      |
|       |              |                |          |
|   Node A,B,C      Node B,C,D       Node A,C,E    |
|     (Raft)         (Raft)            (Raft)      |
+--------------------------------------------------+

Each range is replicated to (by default) three nodes via Raft. Writes need a quorum (two of three) to commit. Reads can be served from the leaseholder, which is one specific replica that holds the lease.

This means three things:

  1. Writes cost a round trip between replicas. If they’re in different regions, that’s a wide-area round trip.
  2. Reads from the leaseholder are fast. Follower reads (a 24.3 feature improvement) let you serve stale reads from any replica.
  3. The placement of replicas and leases is your most important tuning lever.

2. Setting Up A Cluster

A real CockroachDB cluster has at least three nodes, ideally spread across availability zones or regions.

Three node single-region cluster

# On each node, after installing the cockroach binary
cockroach start \
  --insecure \
  --store=/data/cockroach \
  --listen-addr=0.0.0.0:26257 \
  --http-addr=0.0.0.0:8080 \
  --join=node1:26257,node2:26257,node3:26257 \
  --locality=region=us-east,zone=us-east-1a \
  --background

Then initialize on any one node:

cockroach init --insecure --host=node1:26257

The --locality flag is critical. CockroachDB uses it to make replica placement decisions and to enable region-aware features like REGIONAL BY ROW tables.

Multi-region cluster

For three regions, you want nine nodes minimum (three per region). Locality flags differ:

# Region 1 nodes
--locality=region=us-east,zone=us-east-1a
--locality=region=us-east,zone=us-east-1b
--locality=region=us-east,zone=us-east-1c

# Region 2 nodes
--locality=region=eu-west,zone=eu-west-1a
# ... and so on

Then tell the database which regions exist:

ALTER DATABASE shopdb PRIMARY REGION 'us-east';
ALTER DATABASE shopdb ADD REGION 'eu-west';
ALTER DATABASE shopdb ADD REGION 'ap-southeast';
ALTER DATABASE shopdb SURVIVE REGION FAILURE;

SURVIVE REGION FAILURE means CockroachDB will keep the cluster online even if an entire region dies. The cost is higher write latency because quorum now needs cross-region acks.

3. Schema Design For Distributed SQL

This is where teams trip. The Postgres patterns you know work, but they’re often not optimal.

Primary keys

Sequential integer primary keys are toxic in CockroachDB. Every insert goes to the same range, that range gets hot, then splits, then the new range gets hot. You end up with a single-range bottleneck no matter how big your cluster is.

-- Bad
CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    ...
);

-- Good
CREATE TABLE events (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    ...
);

UUIDs distribute writes across the keyspace. If you absolutely need a monotonic ID for ordering, use unique_rowid() or generate IDs on the application side with a snowflake-style scheme.

Multi-region tables

CockroachDB 24.3 supports three table localities:

-- All replicas in primary region, slow cross-region reads/writes
CREATE TABLE config (...) LOCALITY GLOBAL;

-- One copy per region, fast local reads
CREATE TABLE products (...) LOCALITY REGIONAL BY TABLE IN PRIMARY REGION;

-- Each row lives in a specific region
CREATE TABLE users (
    id UUID PRIMARY KEY,
    email TEXT,
    home_region crdb_internal_region NOT NULL
) LOCALITY REGIONAL BY ROW AS home_region;

REGIONAL BY ROW is the killer feature. A row belongs to a region, and queries that filter by region get fast local reads. Inserts go to the home region’s replicas. Cross-region reads are still possible but slower.

-- Fast: stays in user's home region
SELECT * FROM users WHERE id = '...' AND home_region = 'us-east';

-- Slow: queries multiple regions
SELECT * FROM users WHERE email = 'foo@example.com';

The second query is slow because there’s no way to know which region the user is in without checking all of them. Either add an index on email per region (expensive) or design your access patterns around the home region.

4. Transactions Without Pretending

CockroachDB transactions are serializable by default. Not snapshot isolation, not read-committed. Serializable. This is stricter than Postgres default, which means some patterns that work in Postgres will return retry errors here.

The retry pattern

Any transaction can fail with a serialization error (SQLSTATE 40001). Your application must retry. This is not optional.

// Idiomatic Go retry loop
import "github.com/cockroachdb/cockroach-go/v2/crdb/crdbpgx"

err := crdbpgx.ExecuteTx(ctx, pool, pgx.TxOptions{}, func(tx pgx.Tx) error {
    var balance int
    if err := tx.QueryRow(ctx,
        "SELECT balance FROM accounts WHERE id = $1", accountID,
    ).Scan(&balance); err != nil {
        return err
    }

    if balance < amount {
        return errors.New("insufficient funds")
    }

    _, err := tx.Exec(ctx,
        "UPDATE accounts SET balance = balance - $1 WHERE id = $2",
        amount, accountID)
    return err
})

The crdbpgx.ExecuteTx helper handles retries automatically. If you’re not using a CockroachDB-aware driver helper, write the retry loop yourself with exponential backoff.

SELECT FOR UPDATE

CockroachDB supports SELECT ... FOR UPDATE and you should use it for any read-then-write transaction. Without it, you’ll get more retry errors.

BEGIN;
SELECT balance FROM accounts WHERE id = $1 FOR UPDATE;
UPDATE accounts SET balance = balance - $2 WHERE id = $1;
COMMIT;

FOR UPDATE acquires a lock at read time, preventing the serialization conflict that would otherwise cause a retry.

Read-only transactions

Read-only transactions can run with AS OF SYSTEM TIME to get follower reads. This is enormously faster in multi-region setups:

SELECT * FROM products
AS OF SYSTEM TIME follower_read_timestamp()
WHERE category = 'electronics';

follower_read_timestamp() returns a time slightly in the past (default 4.8 seconds) so the read can be served from any replica without contacting the leaseholder. For analytical queries this is a huge win. The staleness is bounded and predictable. For dashboards, reporting, and most read-heavy paths it’s invisible to users.

You can also wrap entire transactions in follower-read mode:

BEGIN AS OF SYSTEM TIME follower_read_timestamp();
SELECT count(*) FROM orders WHERE status = 'shipped';
SELECT sum(total) FROM orders WHERE created_at > '2025-06-01';
COMMIT;

Every statement in the transaction reads at the same consistent past timestamp. Combined with multi-statement reports this becomes a cheap snapshot read across many tables.

5. Performance Tuning

EXPLAIN ANALYZE

CockroachDB’s EXPLAIN is more useful than Postgres’s because it shows network costs.

EXPLAIN ANALYZE (DISTSQL)
SELECT u.email, o.total
FROM users u JOIN orders o ON u.id = o.user_id
WHERE u.home_region = 'us-east'
  AND o.created_at > '2025-06-01';

The output shows execution time, network bytes, and which nodes participated. If you see cross-region traffic on a query that should be local, fix your schema or query.

Statement diagnostics

CockroachDB has built-in statement bundles, which capture everything about a slow query for offline analysis:

ACTIVATE STATEMENT DIAGNOSTICS FOR SELECT * FROM orders WHERE ...;

After the query runs, download the bundle from the DB Console and inspect it offline. This has saved me hours of in-production debugging.

Index hints

The cost-based optimizer in 24.3 is good but not perfect. When it picks the wrong index, you can hint:

SELECT * FROM orders@orders_user_id_idx WHERE user_id = $1;

Use sparingly. Index hints become a maintenance burden over time.

6. Operations And Backups

Backups

Backup to object storage (S3, GCS, Azure Blob):

BACKUP INTO 's3://my-backups/cluster/{nodelocal://1/temp}?AUTH=specified&AWS_ACCESS_KEY_ID=...&AWS_SECRET_ACCESS_KEY=...'
AS OF SYSTEM TIME '-10s';

Schedule incremental backups:

CREATE SCHEDULE backup_daily
FOR BACKUP INTO 's3://my-backups/cluster' WITH revision_history
RECURRING '@daily'
FULL BACKUP '@weekly'
WITH SCHEDULE OPTIONS first_run = 'now';

CockroachDB 24.3 backups are incremental and parallel by default. A full cluster backup of a 5 TB cluster typically takes 10-20 minutes.

Upgrades

Rolling upgrades work. Take one node down at a time, upgrade the binary, restart, move on. The cluster remains available throughout.

# On the node being upgraded
cockroach node drain --certs-dir=/certs --host=node1:26257
# Stop the process, swap binaries, restart
systemctl restart cockroachdb

After all nodes are upgraded, finalize:

SET CLUSTER SETTING version = '24.3';

This unlocks new features that depend on all nodes being on the new version.

Common Pitfalls

1. Hot ranges from sequential writes

Already mentioned but worth repeating. SERIAL primary keys, monotonic timestamps as the leading column of an index, and any other monotonic key creates hot ranges. Use UUIDs or hash-prefixed keys.

2. Forgetting transactions can retry

Code that works in Postgres because it implicitly relies on read-committed semantics will return serialization errors on CockroachDB. Wrap every transaction in a retry loop.

3. Cross-region queries you didn’t plan for

A schema migration that adds a new index without thinking about region placement can suddenly make queries cross-region. Always check EXPLAIN ANALYZE after schema changes in multi-region clusters.

4. Trusting SELECT count(*) to be fast

count(*) requires scanning the table. On a big table, that’s slow. CockroachDB has no equivalent to Postgres’s pg_class.reltuples estimate. If you need fast counts, maintain a counter table updated by triggers or application code.

Troubleshooting

Cluster says “ranges underreplicated”

Usually means a node went down or storage filled up. Check crdb_internal.ranges for ranges with fewer replicas than expected:

SELECT range_id, replicas, learner_replicas, range_size
FROM crdb_internal.ranges_no_leases
WHERE array_length(replicas, 1) < 3
LIMIT 10;

If a node is down, replicas will be rebuilt on remaining nodes after server.time_until_store_dead (default 5 minutes).

Transaction takes forever then fails with RETRY_SERIALIZABLE

Lock contention. Two transactions are touching the same row in incompatible ways. Find them:

SELECT * FROM crdb_internal.cluster_locks
WHERE waiting_count > 0
ORDER BY waiting_count DESC;

The fix is usually SELECT FOR UPDATE in the right places or shorter transactions.

Performance degraded after rolling upgrade

The optimizer learns from statistics, and statistics rebuild after upgrades. Wait a day or run ANALYZE explicitly:

ANALYZE TABLE orders;

For more on the underlying SQL features, the Postgres wire protocol documentation is useful since CockroachDB implements it precisely.

Wrapping Up

CockroachDB 24.3 is a credible production database for teams that genuinely need multi-region resilience and horizontal scaling. It is not a replacement for Postgres in every case. Single-region applications without scaling problems are better served by tuned Postgres. The complexity tax of distributed SQL only pays off when you actually need distributed properties.

If you’re evaluating distributed SQL options, also look at YugabyteDB for global apps which takes a different approach to the same problem.