Goroutine Patterns for Production Go Microservices

Goroutine Patterns for Production Go Microservices

March 9, 2023 · 7 min read · by Muhammad Amal programming

TL;DR — go func() is a primitive, not a pattern; production code needs bounded workers, structured cancellation, and explicit backpressure / errgroup plus contexts replaces 90% of hand-rolled WaitGroup code / Unbounded goroutines are the single most common Go memory leak I’ve debugged.

The Go concurrency model is genuinely good. It’s also genuinely easy to misuse. Most postmortems I’ve read at $work over the past three years involve either (a) a goroutine leak or (b) an unbounded fan-out that took down a downstream system. Both of these are pattern problems, not language problems.

This post catalogs the patterns I actually use in production microservices. It’s not exhaustive — sync.Pool, atomic ops, lock-free ring buffers are their own topic — but it covers the shapes that handle traffic in any meaningfully concurrent Go service.

If you’re coming from the gRPC streaming post, the bidi example used errgroup already. We’ll go deeper here.

The Bounded Worker Pool

The base case. You have a queue of work; you want N goroutines processing in parallel, with a clean shutdown path.

package worker

import (
    "context"
    "sync"
)

type Job func(ctx context.Context) error

type Pool struct {
    size int
    jobs chan Job
    wg   sync.WaitGroup
}

func NewPool(size int) *Pool {
    return &Pool{
        size: size,
        jobs: make(chan Job, size*2),
    }
}

func (p *Pool) Start(ctx context.Context) {
    for i := 0; i < p.size; i++ {
        p.wg.Add(1)
        go func() {
            defer p.wg.Done()
            for {
                select {
                case <-ctx.Done():
                    return
                case job, ok := <-p.jobs:
                    if !ok {
                        return
                    }
                    _ = job(ctx)
                }
            }
        }()
    }
}

func (p *Pool) Submit(ctx context.Context, job Job) error {
    select {
    case p.jobs <- job:
        return nil
    case <-ctx.Done():
        return ctx.Err()
    }
}

func (p *Pool) Shutdown() {
    close(p.jobs)
    p.wg.Wait()
}

A few choices worth defending:

The job channel has a small buffer (size*2). Bigger buffers feel like backpressure but really just delay it. With a buffer of 10,000 you’ve moved the problem from “the pool is full” to “we OOM before we ever notice the pool is full.”

Submit respects the caller’s context. Don’t make submit fire-and-forget — the caller often has a deadline and wants to fail fast if the pool is saturated.

Errors are swallowed at the worker level. In real code you’d structured-log them or push to an error channel; the right call depends on whether the caller needs the result. If they do, use errgroup instead.

errgroup: Structured Concurrency for the Reasonable

The golang.org/x/sync/errgroup package is the closest Go has to structured concurrency. It bundles a WaitGroup, a derived context that cancels on first error, and an error collector. Use it whenever you have a fixed set of concurrent tasks that share a fate.

import "golang.org/x/sync/errgroup"

func (s *Service) Fanout(ctx context.Context, ids []string) ([]*Result, error) {
    g, gctx := errgroup.WithContext(ctx)
    results := make([]*Result, len(ids))

    for i, id := range ids {
        i, id := i, id // pre-1.22 loop variable capture
        g.Go(func() error {
            r, err := s.fetch(gctx, id)
            if err != nil {
                return err
            }
            results[i] = r
            return nil
        })
    }

    if err := g.Wait(); err != nil {
        return nil, err
    }
    return results, nil
}

The i, id := i, id shadow is necessary because we’re on Go 1.20, and the loop variable per-iteration scoping fix landed in 1.22. If you forget it, every goroutine sees the final loop value and your results array is full of duplicates. Static analyzers catch this — run govet with the loopclosure check enabled in CI.

Writing to results[i] from multiple goroutines is safe because each goroutine writes to a different index. No mutex needed. Reading the slice after g.Wait() returns is also safe — the waitgroup happens-before semantics cover it.

errgroup with Limit

As of golang.org/x/sync v0.1, errgroup.Group has a SetLimit(n) method. Use it to cap concurrency without writing your own pool:

g, gctx := errgroup.WithContext(ctx)
g.SetLimit(8)
for _, id := range ids {
    id := id
    g.Go(func() error {
        return process(gctx, id)
    })
}
return g.Wait()

When the limit is hit, g.Go blocks until a slot is free. Combine this with a context deadline upstream and you have bounded parallel processing in five lines.

Fan-Out / Fan-In with Channels

Channels still earn their keep when you have a pipeline of stages and each stage has different parallelism. Single producer, N parallel processors, single consumer.

func Pipeline(ctx context.Context, in <-chan *Order) <-chan *EnrichedOrder {
    out := make(chan *EnrichedOrder)
    const workers = 8

    var wg sync.WaitGroup
    wg.Add(workers)
    for i := 0; i < workers; i++ {
        go func() {
            defer wg.Done()
            for order := range in {
                enriched, err := enrich(ctx, order)
                if err != nil {
                    continue // or push to error channel
                }
                select {
                case out <- enriched:
                case <-ctx.Done():
                    return
                }
            }
        }()
    }

    go func() {
        wg.Wait()
        close(out)
    }()

    return out
}

The convention I follow: the function that creates a channel is responsible for closing it. The closer goroutine waits on the WaitGroup, then closes out. Downstream consumers see a clean EOF when the pipeline drains.

This pattern composes. You can stack stages — fetch, enrich, validate, store — each with its own parallelism, connected by channels. The whole pipeline shuts down cleanly when the source channel closes or the context cancels.

Rate-Limited Dispatch

When fanning out to a downstream that has a rate limit (which is most of them), you need a token bucket, not just a worker count. The golang.org/x/time/rate package is the standard answer.

import "golang.org/x/time/rate"

type Client struct {
    limiter *rate.Limiter
    http    *http.Client
}

func NewClient(rps int, burst int) *Client {
    return &Client{
        limiter: rate.NewLimiter(rate.Limit(rps), burst),
        http:    &http.Client{Timeout: 5 * time.Second},
    }
}

func (c *Client) Do(ctx context.Context, req *http.Request) (*http.Response, error) {
    if err := c.limiter.Wait(ctx); err != nil {
        return nil, err
    }
    return c.http.Do(req)
}

limiter.Wait blocks until a token is available or the context cancels. It does not consume CPU spinning. This is the right primitive for “I have 200 concurrent goroutines but the API allows 50 rps” — the goroutines stack up at Wait, the limiter releases them at the configured rate.

Don’t use time.Sleep(time.Second / rate) in a loop. It looks like it works in a benchmark and falls over under jitter.

Common Pitfalls

The collection of foot-guns I keep encountering:

Spawning a goroutine per request without a pool. A burst of 50k requests becomes 50k goroutines. Each is cheap (~2 KB) but they all want database connections, CPU, and downstream tokens. The right number is bounded.
go func() without thinking about lifecycle. Who waits for it? Who cancels it? If you can’t answer both, you have a leak.
Closing channels from the receiver side. Channels are closed by the sender. Closing from a receiver causes a panic in the sender. The pipeline pattern above is correct; reversing it is not.
Reading after close vs writing after close. Reading from a closed channel returns the zero value and ok=false. Writing to a closed channel panics. This asymmetry causes a lot of bugs in shutdown paths.
Using sync.WaitGroup without errgroup. If your goroutines can fail and the failure needs to propagate, WaitGroup alone forces you to invent cancellation and error collection. errgroup gives you both.
Pre-1.22 loop variable capture. Mentioned above. On Go 1.20 this is still a real hazard. The fix in 1.22 makes the per-iteration variable the default, but until you upgrade, shadow your variables.
Unbounded time.After in a select. Every time.After creates a timer that’s only collected by GC when it fires. In a tight loop, you’ll accumulate timers. Use time.NewTimer and Stop it, or use a context deadline.

Wrapping Up

Concurrency in Go is a tool; like all sharp tools it rewards practice and punishes carelessness. Most of the patterns here fit on one page each, but the bugs they prevent take days to diagnose. Memorize errgroup, internalize bounded pools, and treat every go func() as a question — who owns its lifecycle? The next post takes this into a specific case: connection pools and resource lifecycle for database-backed gRPC services.

The Bounded Worker Pool

errgroup: Structured Concurrency for the Reasonable

errgroup with Limit

Fan-Out / Fan-In with Channels

Rate-Limited Dispatch

Common Pitfalls

Wrapping Up

Related posts

Testing gRPC Services in Go with testcontainers and bufconn

Observability for Go gRPC Services with OpenTelemetry

Connection Pooling for gRPC and Postgres in Go

gRPC Interceptors in Go, Auth, Logging, and Recovery

Context, Deadlines, and Cancellation in gRPC Microservices

gRPC Streaming RPCs in Go, Server, Client, and Bidirectional

gRPC Basics in Go, From Proto to Production Server

Safe Shared State in Rust, Arc, Mutex, and the Channel You Should Pick

Let’s Start a Project