docker.recipes
Fundamentals8 min read

Health Checks and Service Dependencies in Docker Compose

Master service startup order with healthchecks and depends_on to ensure reliable container orchestration.

01The Problem with depends_on

Docker Compose's depends_on only waits for a container to **start**, not for the application inside to be **ready**. Your web app might crash because it tried to connect to PostgreSQL before the database finished initializing. Healthchecks solve this by letting you define what 'ready' actually means.

Without healthchecks, depends_on is nearly useless for production. A started container doesn't mean a ready service.

02Writing Effective Healthchecks

A healthcheck runs a command inside your container at regular intervals. If the command exits with 0, the container is healthy. If it exits with 1, it's unhealthy. Docker tracks this state and can restart unhealthy containers or delay dependent services.
1services:
2 db:
3 image: postgres:16-alpine
4 healthcheck:
5 test: ["CMD-SHELL", "pg_isready -U postgres"]
6 interval: 10s # Check every 10 seconds
7 timeout: 5s # Wait max 5s for response
8 retries: 5 # Fail after 5 consecutive failures
9 start_period: 30s # Grace period for slow startups
10 environment:
11 POSTGRES_PASSWORD: ${DB_PASSWORD}

start_period gives slow-starting services time to initialize before healthchecks count as failures.

03Using condition: service_healthy

The magic happens when you combine healthchecks with depends_on conditions. Instead of just starting in order, Docker will wait until the dependency is actually healthy before starting the dependent service.
1services:
2 app:
3 image: myapp:latest
4 depends_on:
5 db:
6 condition: service_healthy # Wait for healthy
7 redis:
8 condition: service_started # Just wait for start
9 migrations:
10 condition: service_completed_successfully # Wait for exit 0
11
12 db:
13 image: postgres:16-alpine
14 healthcheck:
15 test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER}"]
16 interval: 5s
17 timeout: 5s
18 retries: 5
19
20 redis:
21 image: redis:alpine
22 healthcheck:
23 test: ["CMD", "redis-cli", "ping"]
24 interval: 5s
25 timeout: 3s
26 retries: 3
27
28 migrations:
29 image: myapp:latest
30 command: python manage.py migrate
31 depends_on:
32 db:
33 condition: service_healthy

04Healthcheck Patterns for Common Services

Different services need different healthcheck strategies. Here are battle-tested patterns for the most common self-hosted services.
1# PostgreSQL
2healthcheck:
3 test: ["CMD-SHELL", "pg_isready -U postgres"]
4 interval: 10s
5 timeout: 5s
6 retries: 5
7
8# MySQL/MariaDB
9healthcheck:
10 test: ["CMD", "healthcheck.sh", "--connect", "--innodb_initialized"]
11 interval: 10s
12 timeout: 5s
13 retries: 5
14
15# Redis
16healthcheck:
17 test: ["CMD", "redis-cli", "ping"]
18 interval: 5s
19 timeout: 3s
20 retries: 3
21
22# MongoDB
23healthcheck:
24 test: ["CMD", "mongosh", "--eval", "db.adminCommand('ping')"]
25 interval: 10s
26 timeout: 5s
27 retries: 5
28
29# HTTP services (generic)
30healthcheck:
31 test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
32 interval: 30s
33 timeout: 10s
34 retries: 3
35
36# Nginx
37healthcheck:
38 test: ["CMD", "curl", "-f", "http://localhost/nginx-health"]
39 interval: 30s
40 timeout: 5s
41 retries: 3

For HTTP healthchecks, create a lightweight /health endpoint that checks database connectivity and returns 200 OK.

05Complex Startup Sequences

Real applications often have multi-stage startup: database first, then migrations, then cache warmup, finally the app. Model this as a dependency chain with appropriate conditions.
1services:
2 # Stage 1: Database
3 postgres:
4 image: postgres:16-alpine
5 healthcheck:
6 test: ["CMD-SHELL", "pg_isready"]
7 interval: 5s
8 timeout: 5s
9 retries: 5
10
11 # Stage 2: Run migrations (one-time task)
12 migrations:
13 image: myapp:latest
14 command: ["./manage.py", "migrate", "--noinput"]
15 depends_on:
16 postgres:
17 condition: service_healthy
18 restart: "no" # Don't restart after success
19
20 # Stage 3: Seed data (optional, one-time)
21 seed:
22 image: myapp:latest
23 command: ["./manage.py", "loaddata", "initial_data.json"]
24 depends_on:
25 migrations:
26 condition: service_completed_successfully
27 restart: "no"
28
29 # Stage 4: Application
30 app:
31 image: myapp:latest
32 command: ["gunicorn", "app:application"]
33 depends_on:
34 postgres:
35 condition: service_healthy
36 migrations:
37 condition: service_completed_successfully
38 healthcheck:
39 test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
40 interval: 30s
41 timeout: 10s
42 retries: 3

service_completed_successfully only works for containers that exit. Don't use it for long-running services.

06Debugging Healthcheck Issues

When healthchecks fail mysteriously, use these commands to diagnose the problem.
1# Check current health status
2docker inspect --format='{{.State.Health.Status}}' container_name
3
4# View healthcheck logs (last 5 checks)
5docker inspect --format='{{json .State.Health}}' container_name | jq
6
7# Run the healthcheck manually
8docker exec container_name pg_isready -U postgres
9
10# Watch health status in real-time
11watch -n 2 "docker ps --format 'table {{.Names}}\t{{.Status}}'"
12
13# Check why a container is unhealthy
14docker inspect container_name | jq '.[0].State.Health.Log'

If a healthcheck works manually but fails in Docker, check for missing tools in the container or PATH issues.

07Healthcheck Best Practices

Follow these guidelines to write reliable healthchecks that improve your stack's resilience: **1. Keep healthchecks fast** - They run frequently; slow checks waste resources **2. Check what matters** - A database healthcheck should verify it can accept connections, not just that the process is running **3. Use start_period** - Give slow services (Elasticsearch, Java apps) time to initialize **4. Don't healthcheck everything** - Simple, stateless containers often don't need them **5. Match intervals to service criticality** - Critical services: 5-10s, less critical: 30-60s