Healthchecks and Service Dependencies in Docker Compose

Docker compose article cover illustration on a gradient background

July 11, 2022 · 4 min read · by Muhammad Amal programming

TL;DR — healthcheck: per service + depends_on: condition: service_healthy on consumers = correct boot ordering. service_started only waits for process start (rarely enough). service_completed_successfully waits for an init container to exit 0. No more “wait-for-it.sh.”

After env vars , the next universal compose problem: services starting in the wrong order. Postgres takes 3 seconds to be ready; your API starts in 0.5 seconds; first connection attempt fails. Without healthchecks, this fails every cold boot.

The healthcheck directive

services:
  postgres:
    image: postgres:14-alpine
    environment:
      POSTGRES_USER: app
      POSTGRES_PASSWORD: app
      POSTGRES_DB: app
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app -d app"]
      interval: 5s
      timeout: 3s
      retries: 10
      start_period: 5s

Three forms of the test::

["CMD", "executable", "arg1", "arg2"] — exec, no shell
["CMD-SHELL", "shell expression"] — runs through /bin/sh
"shell expression" — string form, same as CMD-SHELL

Tuning:

interval — how often to check. 5s for fast-booting; 30s for stable services.
timeout — wait per check. 3s default.
retries — failures before marking unhealthy. 10 for boot tolerance.
start_period — grace period before failures count. Useful for slow-starting services (Elasticsearch).

The container reports (healthy) in docker ps once test succeeds.

depends_on conditions

api:
  depends_on:
    postgres:
      condition: service_healthy
    migrator:
      condition: service_completed_successfully
    redis:
      condition: service_started

Three condition values:

service_healthy — wait for the dependency’s healthcheck to pass. Use when “is this ready to accept connections?” matters. Most common.

service_started — wait for the dependency’s process to be running. No healthcheck needed. Rarely sufficient — process running ≠ ready.

service_completed_successfully — wait for the dependency to exit with status 0. For init/migration containers that should finish before the app starts.

A complete boot sequence

services:
  postgres:
    image: postgres:14-alpine
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app -d app"]
      interval: 5s
      retries: 10

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      retries: 10

  migrator:
    image: my-app:latest
    command: ["migrate", "up"]
    depends_on:
      postgres: { condition: service_healthy }

  api:
    image: my-app:latest
    command: ["server"]
    depends_on:
      postgres: { condition: service_healthy }
      redis: { condition: service_healthy }
      migrator: { condition: service_completed_successfully }
    ports: ["8080:8080"]

  worker:
    image: my-app:latest
    command: ["worker"]
    depends_on:
      postgres: { condition: service_healthy }
      redis: { condition: service_healthy }
      migrator: { condition: service_completed_successfully }

The flow:

Postgres + Redis start, healthchecks begin
Compose waits for both to be (healthy)
Migrator runs, exits 0
API + worker start

docker compose up blocks at each gate. First-run can take ~30 seconds; subsequent runs (with cached data volumes) ~10 seconds. All deterministic.

Healthcheck recipes per service

Postgres:

healthcheck:
  test: ["CMD-SHELL", "pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB"]
  interval: 5s
  retries: 10

Note $$ — escapes $ in compose YAML so Postgres env is referenced inside the container.

Redis:

healthcheck:
  test: ["CMD", "redis-cli", "ping"]
  interval: 5s
  retries: 10

MySQL:

healthcheck:
  test: ["CMD", "mysqladmin", "ping", "-h", "localhost"]
  interval: 5s
  retries: 10

Generic HTTP service:

healthcheck:
  test: ["CMD-SHELL", "wget -qO- http://localhost:8080/healthz || exit 1"]
  interval: 10s
  retries: 5

Kafka:

healthcheck:
  test: ["CMD-SHELL", "kafka-broker-api-versions --bootstrap-server localhost:9092"]
  interval: 10s
  retries: 10
  start_period: 30s

Kafka takes 20-30 seconds to start. start_period matters.

When healthchecks fail

If a healthcheck fails enough times to exhaust retries:

Container marked unhealthy
Compose orchestrators may restart it (Docker Swarm) or report failure (Compose)
Dependent services wait forever (or fail with timeout)

To debug:

docker compose ps                  # see state
docker inspect <container>         # see healthcheck history
docker compose logs <service>      # see why

The State.Health.Log field in docker inspect has the last 5 healthcheck outputs.

What healthchecks DON’T do

Don’t restart unhealthy containers in Compose (Swarm does; Compose doesn’t)
Don’t replace orchestrator-level liveness probes
Don’t verify deep correctness — only that the test passes

For production-grade health, use the orchestrator’s own probes (Kubernetes liveness/readiness). Compose healthchecks are for boot-order coordination.

Common Pitfalls

No healthcheck. Default depends_on only waits for process start. Race conditions return.

service_started everywhere. Almost always wrong. Use service_healthy.

Forgetting $$ in healthcheck commands. $ is consumed by Compose for variable substitution. Escape with $$ to pass $ into the shell.

Long interval on fast services. A 30s interval means 30s before the first check; deps wait that long. Use 5s for boot-time.

No start_period on slow-starting services. Elasticsearch, Kafka, etc. Default 0 means failures count immediately; they take a minute to be ready. Set start_period.

Healthcheck that hits the network. Should be internal (loopback). External healthcheck adds latency and dependency on network state.

Wrapping Up

healthcheck: per service + depends_on: condition: service_healthy = no more “wait-for-it” scripts. Compose handles the boot order. Wednesday: Compose Watch + live reload — the dev-loop improvement that landed in v2.6.

The healthcheck directive

depends_on conditions

A complete boot sequence

Healthcheck recipes per service

When healthchecks fail

What healthchecks DON’T do

Common Pitfalls

Wrapping Up

Related posts

July Retro, Compose in Production-Adjacent Workflows

Docker Compose vs Kubernetes for Local Development

Docker Compose for CI, Ephemeral Stacks per Test Run

Docker Compose Resource Limits, Memory and CPU

Building Images Inside Docker Compose, build vs image

Networking in Docker Compose, Bridges, Aliases, External Networks

Compose Watch and Live Reload for Local Development

Environment Variables and Secrets in Docker Compose

Let’s Start a Project