background-shape
Health Checks and Graceful Shutdown in Go Web Services
January 24, 2022 · 6 min read · by Muhammad Amal programming

TL;DR — Liveness checks ask “is the process alive?”, readiness asks “should it get traffic right now?”. On SIGTERM: flip readiness to fail, sleep briefly so the load balancer drops you, then server.Shutdown(ctx) to drain in-flight requests.

After config and logging, the third boot-time piece every Go service needs is a working shutdown story. Specifically: liveness + readiness HTTP endpoints, signal handling, and http.Server.Shutdown wired up so a rolling deploy doesn’t drop in-flight requests.

This is one of those things that “just works” until it doesn’t. The first time you do a rolling deploy of a Go service and watch a 0.4% error spike right when the new replicas come up, you’ll realize the default http.ListenAndServe is not actually safe for production. Here’s the pattern that fixes that.

Liveness vs readiness — they are not the same

This trips people up. Both are HTTP endpoints. Both return 200 when “everything is fine.” The distinction is what an orchestrator does when each one fails.

Liveness answers: “is the process alive and capable of doing work?” If liveness fails, the orchestrator kills and restarts the container. It’s the deadman’s switch.

Readiness answers: “should this instance receive traffic right now?” If readiness fails, the orchestrator stops routing traffic to it (but does not restart it). It’s the “I’m draining” signal.

Pragmatic implementations:

  • Liveness = “the HTTP server is still answering”. Almost trivial:
    func liveness(w http.ResponseWriter, r *http.Request) {
        w.WriteHeader(200)
        w.Write([]byte("ok"))
    }
    
  • Readiness = “I can serve real traffic”. Checks downstream dependencies (DB, Redis, critical upstreams), with timeouts:
    func (h *Handler) readiness(w http.ResponseWriter, r *http.Request) {
        ctx, cancel := context.WithTimeout(r.Context(), 500*time.Millisecond)
        defer cancel()
    
        if err := h.db.PingContext(ctx); err != nil {
            http.Error(w, "db unavailable", 503)
            return
        }
        if !h.shuttingDown.Load() {
            w.WriteHeader(200)
            w.Write([]byte("ready"))
            return
        }
        http.Error(w, "draining", 503)
    }
    

The shuttingDown flag is the trick that makes graceful shutdown work. We’ll wire it up in a moment.

The shutdown sequence

Here’s the boot+shutdown pattern in main.go:

package main

import (
    "context"
    "errors"
    "log"
    "net/http"
    "os/signal"
    "sync/atomic"
    "syscall"
    "time"
)

func main() {
    var shuttingDown atomic.Bool

    ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
    defer stop()

    mux := http.NewServeMux()
    mux.HandleFunc("/healthz", liveness)
    mux.HandleFunc("/readyz", readinessHandler(&shuttingDown, db))
    mux.Handle("/v1/", apiRouter())

    srv := &http.Server{
        Addr:              ":8080",
        Handler:           mux,
        ReadHeaderTimeout: 5 * time.Second,
    }

    go func() {
        log.Printf("listening on %s", srv.Addr)
        if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
            log.Fatalf("listen: %v", err)
        }
    }()

    <-ctx.Done()
    log.Println("shutdown signal received")

    // Step 1: flip readiness immediately.
    shuttingDown.Store(true)

    // Step 2: wait long enough for the load balancer to notice.
    // The exact number depends on your readiness-probe interval.
    time.Sleep(5 * time.Second)

    // Step 3: drain in-flight requests with a hard ceiling.
    shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    if err := srv.Shutdown(shutdownCtx); err != nil {
        log.Printf("graceful shutdown timed out: %v", err)
    }
    log.Println("clean exit")
}

The three-step sequence is the whole point.

Step 1: flip readiness. As soon as you receive SIGTERM, mark yourself not-ready. Existing connections continue; new ones from the load balancer stop arriving once it notices.

Step 2: sleep. This is the part most tutorials skip. The orchestrator does not check readiness on every request; it polls every few seconds. If you skip straight to step 3, the load balancer will keep sending new requests right up until the moment you close the listener. Sleeping for a duration slightly longer than the readiness poll interval gives the LB time to drop you from the rotation.

Step 3: drain. server.Shutdown(ctx) stops accepting new connections and waits for in-flight requests to complete, until the ctx times out. 30 seconds is a reasonable ceiling; longer than that and an orchestrator will SIGKILL you anyway.

Kubernetes-specific nuances

Two things to know if you’re deploying to Kubernetes (we are; this matters).

terminationGracePeriodSeconds. Defaults to 30 seconds. Must be longer than your readiness sleep + your shutdown drain timeout, or k8s will SIGKILL you mid-drain. Set it to 45 if you sleep 5 and drain 30, with a safety margin.

The preStop hook. Some teams put the readiness-flip into a Kubernetes preStop hook instead of in the application:

lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 5"]

This works because Kubernetes sends preStop, waits for it to complete, then sends SIGTERM. The 5-second sleep gives the iptables update time to propagate. The downside: your Go binary needs /bin/sh, which kills the distroless dream.

We do the readiness flip + sleep inside the Go process. It avoids the preStop dance and means our shutdown logic is testable in isolation.

A working readiness check

func readinessHandler(shuttingDown *atomic.Bool, db *sql.DB) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        if shuttingDown.Load() {
            http.Error(w, "draining", http.StatusServiceUnavailable)
            return
        }

        ctx, cancel := context.WithTimeout(r.Context(), 500*time.Millisecond)
        defer cancel()

        if err := db.PingContext(ctx); err != nil {
            http.Error(w, "db unavailable: "+err.Error(), http.StatusServiceUnavailable)
            return
        }

        w.WriteHeader(http.StatusOK)
        w.Write([]byte("ready"))
    }
}

Three checks: shutting down, DB reachable, everything else fine. Two failure modes both return 503 with a descriptive body. Don’t check Kafka, Redis, every downstream — that creates cascading failure (Redis blip = your readiness fails = LB drops you = traffic piles up on remaining replicas). Check only the dependencies whose absence makes you genuinely unable to serve.

Common Pitfalls

Using the same endpoint for liveness and readiness. Now the orchestrator restarts your process every time the DB blips, even if a restart wouldn’t help. Liveness should almost never fail unless the process is genuinely stuck.

No timeout on the readiness DB check. A slow DB ping that hangs for 30 seconds turns your readiness endpoint into a Slow Loris vector. Always wrap in context.WithTimeout.

Skipping the readiness sleep before drain. This is the single most common source of “rolling deploys cause errors”. Sleep 5 seconds. It’s free.

Not handling SIGTERM. Default Go behaviour on SIGTERM is “the process exits immediately”. In-flight requests drop. Wire up signal.NotifyContext (Go 1.16+) and shut down cleanly.

Shutdown ctx shorter than your slowest in-flight request. If your 95th percentile latency is 8 seconds and your drain ctx is 5 seconds, you’ll drop the long tail. Either set the drain timeout above your worst-case latency, or cap request timeouts so they finish faster than your drain window.

Forgetting ReadHeaderTimeout. Without it, a slow-headers attack can keep connections open forever. Set it to 5 seconds. Always.

Wrapping Up

Liveness + readiness + a three-step shutdown sequence (flip → sleep → drain) is the pattern that makes Go services play nice with rolling deploys. It’s 40 lines of code. It eliminates a whole class of “0.3% error spike during deploy” tickets. Next post: the sub-15 MB Go production image — pulling the threads from earlier weeks together into the actual artifact we deploy.