Connection Pooling for gRPC and Postgres in Go
TL;DR — gRPC client connections are multiplexed, so one per target is usually right; Postgres connections are not multiplexed, so size the pool to your concurrency × work-per-request / Pool sizing is the math:
pool_size = max_concurrent_queries, capped bymax_connections / num_replicas/ The default*sql.DBsettings are wrong for any non-trivial service.
Pools are where engineering taste shows. Get them right and a service handles 10x its baseline traffic gracefully. Get them wrong and a single slow query takes down everything because every other request is waiting for a connection. I’ve seen both. Worth a careful post.
This continues from the interceptors post where we built middleware. Here we move to the resource lifecycle question: how many connections, how long they live, when to close them, how to size the pool against your real workload.
gRPC: One ClientConn Is Usually Enough
A grpc.ClientConn is a single HTTP/2 connection (well, sort of — see below). HTTP/2 multiplexes streams over a connection. A single connection can handle hundreds of concurrent RPCs.
Practically: create one *grpc.ClientConn per downstream service, hold it for the lifetime of the process, share it across goroutines. Don’t open a connection per call; the handshake cost is real and the pool overhead is wasted.
conn, err := grpc.Dial(
"billing.svc.cluster.local:50051",
grpc.WithTransportCredentials(credentials.NewTLS(tlsConfig)),
grpc.WithDefaultCallOptions(grpc.MaxCallRecvMsgSize(8*1024*1024)),
grpc.WithKeepaliveParams(keepalive.ClientParameters{
Time: 30 * time.Second,
Timeout: 10 * time.Second,
PermitWithoutStream: true,
}),
grpc.WithDefaultServiceConfig(`{
"loadBalancingPolicy": "round_robin",
"methodConfig": [{
"name": [{}],
"retryPolicy": {
"maxAttempts": 3,
"initialBackoff": "0.1s",
"maxBackoff": "1s",
"backoffMultiplier": 2.0,
"retryableStatusCodes": ["UNAVAILABLE"]
}
}]
}`),
)
Things this gets right:
- TLS at the transport level. Plaintext is for localhost-only dev.
- Keepalive pings to detect dead connections faster than TCP keepalive’s default.
- Round-robin load balancing — only useful when the resolver returns multiple addresses, e.g., via DNS or xDS.
- Retry policy as a service config string. JSON because the format is the same across language clients.
If you’re talking to a Kubernetes service with multiple pods, use the dns:///billing.svc.cluster.local:50051 scheme or set up xDS for proper load balancing. Plain DNS without dns:/// resolves once and sticks to the first IP, which means one pod gets all your traffic.
When You Do Want Multiple ClientConns
The exception: a single ClientConn means a single HTTP/2 connection per backend. If your throughput per connection is bottlenecked (rare, but possible on very high-RPS services), you can pool connections manually. Round-robin across N ClientConns for the same target, sharing them across the service. I’ve only needed this twice in production, and both times the right fix turned out to be elsewhere.
Postgres: pgx and the Pool That Matters
pgx v5 is what I use. The database/sql driver is fine, but pgxpool exposes more knobs and better defaults.
import "github.com/jackc/pgx/v5/pgxpool"
cfg, err := pgxpool.ParseConfig(connString)
if err != nil {
return nil, err
}
cfg.MaxConns = 20
cfg.MinConns = 5
cfg.MaxConnLifetime = 30 * time.Minute
cfg.MaxConnIdleTime = 5 * time.Minute
cfg.HealthCheckPeriod = 1 * time.Minute
pool, err := pgxpool.NewWithConfig(ctx, cfg)
The defaults: MaxConns defaults to 4 or runtime.NumCPU(), whichever is bigger. MinConns is 0. Lifetimes are infinite. For a single-instance dev service, these are fine. For a service handling real traffic, they’re aggressively wrong.
The Sizing Math
Three things drive pool size:
- Concurrent queries per instance. If you have 100 in-flight RPCs and each makes one DB query at a time, you need ~100 connections to avoid queueing. Or you accept queueing and size smaller.
- Database
max_connections. Postgres defaults to 100. Subtract overhead for admin tools, replication, etc. Divide by the number of service instances. If you have 10 instances and 80 connections to share, each instance gets 8. Period. - PgBouncer or another pooler. With transaction pooling between your service and Postgres, you can size aggressively in your app pool — but you lose session-level features (LISTEN/NOTIFY, prepared statements with names) and have to be careful about transactions.
The formula I use as a starting point:
pool_size = min(
concurrent_requests_per_instance * avg_queries_per_request,
(postgres_max_connections - reserved) / num_instances
)
Then I tune from there based on production query latency and connection wait time. Both should be metrics.
Connection Lifetime
MaxConnLifetime is non-obvious. Long-lived connections accumulate query plans, type oids, and (sometimes) memory bloat on the server side. They also resist failover — if your primary goes away, idle connections to it linger until the OS kills them.
I set MaxConnLifetime to 30 minutes. Long enough that the churn is negligible; short enough that connections recycle through DNS, get distributed across replicas, and don’t accumulate state.
MaxConnIdleTime is shorter, 5 minutes. Idle connections cost memory on the database side. Returning them to the pool when traffic dies down is the polite thing to do.
Using the Pool in Handlers
The pool is shared across all handlers and goroutines. Treat it as a singleton dependency.
type Repo struct {
pool *pgxpool.Pool
}
func (r *Repo) FindInvoice(ctx context.Context, id string) (*Invoice, error) {
var inv Invoice
err := r.pool.QueryRow(ctx,
`SELECT id, customer_id, total_cents, currency, created_at
FROM invoices WHERE id = $1`, id,
).Scan(&inv.ID, &inv.CustomerID, &inv.TotalCents, &inv.Currency, &inv.CreatedAt)
if err != nil {
if errors.Is(err, pgx.ErrNoRows) {
return nil, ErrNotFound
}
return nil, err
}
return &inv, nil
}
Always pass ctx. The pool’s QueryRow accepts a context, propagates the deadline to the wire, and cancels the query if the context fires. Without it, a slow query holds a connection until the network times out.
For transactions:
func (r *Repo) Transfer(ctx context.Context, fromID, toID string, cents int64) error {
tx, err := r.pool.Begin(ctx)
if err != nil {
return err
}
defer tx.Rollback(ctx)
if _, err := tx.Exec(ctx,
`UPDATE accounts SET balance_cents = balance_cents - $1 WHERE id = $2`,
cents, fromID,
); err != nil {
return err
}
if _, err := tx.Exec(ctx,
`UPDATE accounts SET balance_cents = balance_cents + $1 WHERE id = $2`,
cents, toID,
); err != nil {
return err
}
return tx.Commit(ctx)
}
defer tx.Rollback(ctx) after a successful commit is a no-op — pgx tracks the transaction state. The defer is your safety net for early returns.
Metrics That Matter
You want a dashboard with at least:
- Pool size: total, in-use, idle. From
pool.Stat(). - Acquire wait time: how long calls block waiting for a connection. p50, p99.
- Query duration: per query type if you can, overall otherwise.
- Connection age: helps debug long-lived connection issues.
pgxpool.Pool.Stat() exposes the counters; wire them to your metrics backend (Prometheus, OpenTelemetry).
If acquire wait time is non-zero p50, your pool is too small. If it’s zero across all percentiles, you might have headroom or you might be over-sized — check connection idle counts.
Common Pitfalls
The recurring list:
- Opening a pool per request. People wrap
pgxpool.Newin their handler. Stop. Open once at startup, share everywhere. defer pool.Close()in the wrong scope. If you close the pool at the end of a request, every subsequent request fails. Pool lifecycle = process lifecycle.- Ignoring pool stats. No metrics on pool utilization is flying blind. The first sign of an outage is usually queueing on connection acquire.
- Holding transactions across RPC calls. A transaction holds a connection until commit or rollback. If you do an external call inside a transaction, you’re blocking that connection for the call’s duration. Plus, if the call fails, you have to handle the rollback explicitly.
- Forgetting to close rows.
pgx.RowsfromQuery(notQueryRow) must be closed.defer rows.Close()right after the query. MaxConnshigher thanpostgres.max_connections / instances. Your pool will succeed locally and fail when the database refuses new connections. Test under load with the real budget.- No keepalive on gRPC client connections. Idle connections that look fine to your code but are dead on the wire. Keepalive surfaces this within seconds instead of minutes.
Wrapping Up
Pool sizing is one of those things that you tune by watching metrics, not by guessing. Start with the formula, measure acquire wait time and query duration, adjust. The next post moves to load balancing and service discovery — closely related, because the pool sizing argument changes depending on how requests are distributed across instances. Read the pgx documentation if you want a deeper tour; the project is well-maintained and the design rationale is documented in the issues.