Healthchecks and Service Dependencies in Docker Compose
TL;DR —
healthcheck:per service +depends_on: condition: service_healthyon consumers = correct boot ordering.service_startedonly waits for process start (rarely enough).service_completed_successfullywaits for an init container to exit 0. No more “wait-for-it.sh.”
After env vars, the next universal compose problem: services starting in the wrong order. Postgres takes 3 seconds to be ready; your API starts in 0.5 seconds; first connection attempt fails. Without healthchecks, this fails every cold boot.
The healthcheck directive
services:
postgres:
image: postgres:14-alpine
environment:
POSTGRES_USER: app
POSTGRES_PASSWORD: app
POSTGRES_DB: app
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app -d app"]
interval: 5s
timeout: 3s
retries: 10
start_period: 5s
Three forms of the test::
["CMD", "executable", "arg1", "arg2"]— exec, no shell["CMD-SHELL", "shell expression"]— runs through /bin/sh"shell expression"— string form, same as CMD-SHELL
Tuning:
interval— how often to check. 5s for fast-booting; 30s for stable services.timeout— wait per check. 3s default.retries— failures before marking unhealthy. 10 for boot tolerance.start_period— grace period before failures count. Useful for slow-starting services (Elasticsearch).
The container reports (healthy) in docker ps once test succeeds.
depends_on conditions
api:
depends_on:
postgres:
condition: service_healthy
migrator:
condition: service_completed_successfully
redis:
condition: service_started
Three condition values:
service_healthy — wait for the dependency’s healthcheck to pass. Use when “is this ready to accept connections?” matters. Most common.
service_started — wait for the dependency’s process to be running. No healthcheck needed. Rarely sufficient — process running ≠ ready.
service_completed_successfully — wait for the dependency to exit with status 0. For init/migration containers that should finish before the app starts.
A complete boot sequence
services:
postgres:
image: postgres:14-alpine
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app -d app"]
interval: 5s
retries: 10
redis:
image: redis:7-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
retries: 10
migrator:
image: my-app:latest
command: ["migrate", "up"]
depends_on:
postgres: { condition: service_healthy }
api:
image: my-app:latest
command: ["server"]
depends_on:
postgres: { condition: service_healthy }
redis: { condition: service_healthy }
migrator: { condition: service_completed_successfully }
ports: ["8080:8080"]
worker:
image: my-app:latest
command: ["worker"]
depends_on:
postgres: { condition: service_healthy }
redis: { condition: service_healthy }
migrator: { condition: service_completed_successfully }
The flow:
- Postgres + Redis start, healthchecks begin
- Compose waits for both to be
(healthy) - Migrator runs, exits 0
- API + worker start
docker compose up blocks at each gate. First-run can take ~30 seconds; subsequent runs (with cached data volumes) ~10 seconds. All deterministic.
Healthcheck recipes per service
Postgres:
healthcheck:
test: ["CMD-SHELL", "pg_isready -U $$POSTGRES_USER -d $$POSTGRES_DB"]
interval: 5s
retries: 10
Note $$ — escapes $ in compose YAML so Postgres env is referenced inside the container.
Redis:
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
retries: 10
MySQL:
healthcheck:
test: ["CMD", "mysqladmin", "ping", "-h", "localhost"]
interval: 5s
retries: 10
Generic HTTP service:
healthcheck:
test: ["CMD-SHELL", "wget -qO- http://localhost:8080/healthz || exit 1"]
interval: 10s
retries: 5
Kafka:
healthcheck:
test: ["CMD-SHELL", "kafka-broker-api-versions --bootstrap-server localhost:9092"]
interval: 10s
retries: 10
start_period: 30s
Kafka takes 20-30 seconds to start. start_period matters.
When healthchecks fail
If a healthcheck fails enough times to exhaust retries:
- Container marked unhealthy
- Compose orchestrators may restart it (Docker Swarm) or report failure (Compose)
- Dependent services wait forever (or fail with timeout)
To debug:
docker compose ps # see state
docker inspect <container> # see healthcheck history
docker compose logs <service> # see why
The State.Health.Log field in docker inspect has the last 5 healthcheck outputs.
What healthchecks DON’T do
- Don’t restart unhealthy containers in Compose (Swarm does; Compose doesn’t)
- Don’t replace orchestrator-level liveness probes
- Don’t verify deep correctness — only that the test passes
For production-grade health, use the orchestrator’s own probes (Kubernetes liveness/readiness). Compose healthchecks are for boot-order coordination.
Common Pitfalls
No healthcheck. Default depends_on only waits for process start. Race conditions return.
service_started everywhere. Almost always wrong. Use service_healthy.
Forgetting $$ in healthcheck commands. $ is consumed by Compose for variable substitution. Escape with $$ to pass $ into the shell.
Long interval on fast services. A 30s interval means 30s before the first check; deps wait that long. Use 5s for boot-time.
No start_period on slow-starting services. Elasticsearch, Kafka, etc. Default 0 means failures count immediately; they take a minute to be ready. Set start_period.
Healthcheck that hits the network. Should be internal (loopback). External healthcheck adds latency and dependency on network state.
Wrapping Up
healthcheck: per service + depends_on: condition: service_healthy = no more “wait-for-it” scripts. Compose handles the boot order. Wednesday: Compose Watch + live reload — the dev-loop improvement that landed in v2.6.