background-shape
Healthchecks and Service Dependencies in Docker Compose
July 11, 2022 · 4 min read · by Muhammad Amal programming

TL;DRhealthcheck: per service + depends_on: condition: service_healthy on consumers = correct boot ordering. service_started only waits for process start (rarely enough). service_completed_successfully waits for an init container to exit 0. No more “wait-for-it.sh.”

After env vars, the next universal compose problem: services starting in the wrong order. Postgres takes 3 seconds to be ready; your API starts in 0.5 seconds; first connection attempt fails. Without healthchecks, this fails every cold boot.

The healthcheck directive

services:
  postgres:
    image: postgres:14-alpine
    environment:
      POSTGRES_USER: app
      POSTGRES_PASSWORD: app
      POSTGRES_DB: app
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app -d app"]
      interval: 5s
      timeout: 3s
      retries: 10
      start_period: 5s

Three forms of the test::

  • ["CMD", "executable", "arg1", "arg2"] — exec, no shell
  • ["CMD-SHELL", "shell expression"] — runs through /bin/sh
  • "shell expression" — string form, same as CMD-SHELL

Tuning:

  • interval — how often to check. 5s for fast-booting; 30s for stable services.
  • timeout — wait per check. 3s default.
  • retries — failures before marking unhealthy. 10 for boot tolerance.
  • start_period — grace period before failures count. Useful for slow-starting services (Elasticsearch).

The container reports (healthy) in docker ps once test succeeds.

depends_on conditions

api:
  depends_on:
    postgres:
      condition: service_healthy
    migrator:
      condition: service_completed_successfully
    redis:
      condition: service_started

Three condition values:

service_healthy — wait for the dependency’s healthcheck to pass. Use when “is this ready to accept connections?” matters. Most common.

service_started — wait for the dependency’s process to be running. No healthcheck needed. Rarely sufficient — process running ≠ ready.

service_completed_successfully — wait for the dependency to exit with status 0. For init/migration containers that should finish before the app starts.

A complete boot sequence

services:
  postgres:
    image: postgres:14-alpine
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app -d app"]
      interval: 5s
      retries: 10

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      retries: 10

  migrator:
    image: my-app:latest
    command: ["migrate", "up"]
    depends_on:
      postgres: { condition: service_healthy }

  api:
    image: my-app:latest
    command: ["server"]
    depends_on:
      postgres: { condition: service_healthy }
      redis: { condition: service_healthy }
      migrator: { condition: service_completed_successfully }
    ports: ["8080:8080"]

  worker:
    image: my-app:latest
    command: ["worker"]
    depends_on:
      postgres: { condition: service_healthy }
      redis: { condition: service_healthy }
      migrator: { condition: service_completed_successfully }

The flow:

  1. Postgres + Redis start, healthchecks begin
  2. Compose waits for both to be (healthy)
  3. Migrator runs, exits 0
  4. API + worker start

docker compose up blocks at each gate. First-run can take ~30 seconds; subsequent runs (with cached data volumes) ~10 seconds. All deterministic.

Healthcheck recipes per service

Postgres:

healthcheck:
  test: ["CMD-SHELL", "pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB"]
  interval: 5s
  retries: 10

Note $$ — escapes $ in compose YAML so Postgres env is referenced inside the container.

Redis:

healthcheck:
  test: ["CMD", "redis-cli", "ping"]
  interval: 5s
  retries: 10

MySQL:

healthcheck:
  test: ["CMD", "mysqladmin", "ping", "-h", "localhost"]
  interval: 5s
  retries: 10

Generic HTTP service:

healthcheck:
  test: ["CMD-SHELL", "wget -qO- http://localhost:8080/healthz || exit 1"]
  interval: 10s
  retries: 5

Kafka:

healthcheck:
  test: ["CMD-SHELL", "kafka-broker-api-versions --bootstrap-server localhost:9092"]
  interval: 10s
  retries: 10
  start_period: 30s

Kafka takes 20-30 seconds to start. start_period matters.

When healthchecks fail

If a healthcheck fails enough times to exhaust retries:

  • Container marked unhealthy
  • Compose orchestrators may restart it (Docker Swarm) or report failure (Compose)
  • Dependent services wait forever (or fail with timeout)

To debug:

docker compose ps                  # see state
docker inspect <container>         # see healthcheck history
docker compose logs <service>      # see why

The State.Health.Log field in docker inspect has the last 5 healthcheck outputs.

What healthchecks DON’T do

  • Don’t restart unhealthy containers in Compose (Swarm does; Compose doesn’t)
  • Don’t replace orchestrator-level liveness probes
  • Don’t verify deep correctness — only that the test passes

For production-grade health, use the orchestrator’s own probes (Kubernetes liveness/readiness). Compose healthchecks are for boot-order coordination.

Common Pitfalls

No healthcheck. Default depends_on only waits for process start. Race conditions return.

service_started everywhere. Almost always wrong. Use service_healthy.

Forgetting $$ in healthcheck commands. $ is consumed by Compose for variable substitution. Escape with $$ to pass $ into the shell.

Long interval on fast services. A 30s interval means 30s before the first check; deps wait that long. Use 5s for boot-time.

No start_period on slow-starting services. Elasticsearch, Kafka, etc. Default 0 means failures count immediately; they take a minute to be ready. Set start_period.

Healthcheck that hits the network. Should be internal (loopback). External healthcheck adds latency and dependency on network state.

Wrapping Up

healthcheck: per service + depends_on: condition: service_healthy = no more “wait-for-it” scripts. Compose handles the boot order. Wednesday: Compose Watch + live reload — the dev-loop improvement that landed in v2.6.