Structured Logging in Go Microservices with Zap

Structured Logging in Go Microservices with Zap

January 21, 2022 · 5 min read · by Muhammad Amal programming

TL;DR — Use zap. JSON output in production, console in dev. Build a request-scoped logger middleware that injects request_id, user_id, and route. Sample noisy paths. Never log secrets. Stdout only.

After the config loader, the second piece every Go microservice needs at boot is a logger. The temptation is to use log.Println everywhere and move on. That works for a hello-world. It does not survive contact with production.

In production you need three things from your logs that log.Println does not give you: a parseable format (so log aggregators like Loki, Datadog, or CloudWatch Insights can index fields), a way to attach context (request ID, user ID, trace ID) without threading it through every function, and the ability to keep up with traffic without becoming the bottleneck. uber-go/zap does all three with the lowest overhead I’ve found.

This post is the zap setup we drop into every new Go service. Boring. Reliable. Survives kubectl logs and your log aggregator’s grep.

Why zap specifically

Three contenders worth knowing in 2022: zap, zerolog, and the standard library log package (with slog arriving in 1.21 later this year, but not in 1.17). Brief comparison.

zap. ~600 ns per structured log line. JSON encoder built in. Two APIs: a fast typed-field API (logger.Info("...", zap.String("key", "v"))) and a slower sugared API (sugar.Infof(...)). Stable since 2017. The API you’ll meet in most Go shops.

zerolog. Similar performance to zap. Slightly different ergonomics with chained .Str("k", "v") calls. Also good. If you’re greenfield and the team prefers its API, fine. If you’re inheriting other Go code, you’re probably looking at zap.

stdlib log. Unstructured. Adequate for CLIs. Don’t ship it in microservices.

We use zap. The rest of this post is what we do with it.

The boot-time setup

package logging

import (
    "os"

    "go.uber.org/zap"
    "go.uber.org/zap/zapcore"
)

func New(env, level string) (*zap.Logger, error) {
    lvl, err := zapcore.ParseLevel(level)
    if err != nil {
        return nil, err
    }

    var cfg zap.Config
    switch env {
    case "production", "staging":
        cfg = zap.NewProductionConfig()
        cfg.EncoderConfig.TimeKey = "ts"
        cfg.EncoderConfig.MessageKey = "msg"
        cfg.EncoderConfig.LevelKey = "level"
        cfg.EncoderConfig.EncodeTime = zapcore.ISO8601TimeEncoder
    default:
        cfg = zap.NewDevelopmentConfig()
        cfg.EncoderConfig.EncodeLevel = zapcore.CapitalColorLevelEncoder
    }

    cfg.Level = zap.NewAtomicLevelAt(lvl)
    cfg.OutputPaths = []string{"stdout"}
    cfg.ErrorOutputPaths = []string{"stderr"}
    cfg.InitialFields = map[string]interface{}{
        "service": os.Getenv("APP_NAME"),
        "env":     env,
    }
    return cfg.Build()
}

In main.go:

logger, err := logging.New(cfg.Env, cfg.LogLevel)
if err != nil {
    log.Fatalf("logger init: %v", err)
}
defer logger.Sync()

logger.Sync() flushes any buffered log lines on exit. Critical for short-lived processes like Kafka consumer one-shots.

A production line ends up looking like this:

{"level":"info","ts":"2022-01-21T09:14:33.812Z","service":"billing","env":"production","msg":"subscription started","customer_id":"cus_abc123","plan_id":"plan_pro_monthly","duration_ms":47}

Parseable. Indexable. Greppable in Cloud Logging. Same shape across every service.

Request-scoped logging

The pattern that pays off the most: a middleware that injects a logger into every request’s context, pre-populated with request_id, route, and (after auth) user_id.

package httplog

import (
    "context"
    "net/http"
    "time"

    "github.com/google/uuid"
    "go.uber.org/zap"
)

type ctxKey struct{}

func With(ctx context.Context) *zap.Logger {
    if l, ok := ctx.Value(ctxKey{}).(*zap.Logger); ok {
        return l
    }
    return zap.NewNop()
}

func Middleware(base *zap.Logger) func(http.Handler) http.Handler {
    return func(next http.Handler) http.Handler {
        return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
            reqID := r.Header.Get("X-Request-ID")
            if reqID == "" {
                reqID = uuid.NewString()
            }

            l := base.With(
                zap.String("request_id", reqID),
                zap.String("method", r.Method),
                zap.String("path", r.URL.Path),
            )

            start := time.Now()
            ctx := context.WithValue(r.Context(), ctxKey{}, l)
            sw := &statusWriter{ResponseWriter: w, status: 200}

            next.ServeHTTP(sw, r.WithContext(ctx))

            l.Info("request",
                zap.Int("status", sw.status),
                zap.Duration("duration", time.Since(start)),
            )
        })
    }
}

type statusWriter struct {
    http.ResponseWriter
    status int
}

func (s *statusWriter) WriteHeader(code int) {
    s.status = code
    s.ResponseWriter.WriteHeader(code)
}

Now any handler can do:

func (h *Handler) StartSubscription(w http.ResponseWriter, r *http.Request) {
    log := httplog.With(r.Context())
    log.Info("starting subscription", zap.String("plan_id", planID))
    // ...
}

The log line carries request_id, method, path, service, env — all without the handler having to thread anything through. Add zap.String("user_id", userID) to the logger after auth resolves and every subsequent log in that request is automatically correlated.

Sampling for noisy paths

Health checks at 1 Hz. Kafka consumers logging every message at debug. Background workers polling every second. These can drown your real signal in noise and rack up your log bill.

Zap’s built-in sampling helps:

core := zapcore.NewSamplerWithOptions(
    cfg.Core(),
    time.Second,
    100,  // log first 100 entries per second per (level, message) tuple
    10,   // log every 10th entry after that
)

For one-off chatty paths, a per-call sampling decision is sometimes simpler:

if rand.Intn(100) == 0 {
    log.Info("kafka consumed", zap.String("topic", t), zap.Int64("offset", o))
}

Bias toward sampling debug-level traces, never sampling errors or warnings.

Common Pitfalls

Logging secrets. API keys, tokens, passwords, full email bodies, payment card numbers. The temptation to “just log the request body for debugging” eventually catches you. Add a redaction layer to your middleware or never log raw request bodies. PII enforcement is a separate concern but it starts here.

Calling sugared methods in hot paths. logger.Sugar().Infow(...) allocates more than logger.Info(...). For log lines on the request path, use the typed-field API. Sugar is fine for boot, fine for one-offs, expensive at 10K req/s.

Defaulting to debug level in production. Set LOG_LEVEL=info in production. Debug should be the dev default and only flipped on for incident debugging. Forgetting this once costs real money on per-GB log services.

Logging to a file. Don’t. The container should log to stdout, the orchestrator scrapes it, the aggregator indexes it. Logging to a file inside a container is one more thing to forget about, plus you’ll lose lines when the container is terminated.

Putting panic-causing code inside Sync(). Some sinks (notably stderr on macOS development) return errors from Sync(). Wrap it: _ = logger.Sync() or check + log to stderr. Don’t let defer logger.Sync() itself crash the process on shutdown.

Not setting service and env initial fields. When you have ten services in one log aggregator, you cannot tell their lines apart without a service field. Set it once, in the logger init.

Wrapping Up

Zap + a small request-scoped middleware gives you logs you can actually use. JSON in prod, console in dev, request_id and route on every line, sampling for noisy paths, secrets-aware. Next post: graceful shutdown and health checks — the third leg of the boot-time setup that every Go service needs before it goes anywhere near production traffic.

Why zap specifically

The boot-time setup

Request-scoped logging

Sampling for noisy paths

Common Pitfalls

Wrapping Up

Related posts

Instrumenting Go Services for Prometheus

Structured Logging in Rust with tracing

Building an ETL Pipeline in Go

Why Lightweight ETL Beats Big Data Tools for Most Teams

September Retro, One Stack to Watch Them All

Prometheus Cardinality and Cost Control

Service-Level Objectives in Practice

Tempo for Distributed Tracing

Let’s Start a Project