Health Checks and Service Dependencies in Docker Compose

01The Problem with depends_on

Docker Compose's depends_on only waits for a container to **start**, not for the application inside to be **ready**. Your web app might crash because it tried to connect to PostgreSQL before the database finished initializing. Healthchecks solve this by letting you define what 'ready' actually means.

Without healthchecks, depends_on is nearly useless for production. A started container doesn't mean a ready service.

02Writing Effective Healthchecks

A healthcheck runs a command inside your container at regular intervals. If the command exits with 0, the container is healthy. If it exits with 1, it's unhealthy. Docker tracks this state and can restart unhealthy containers or delay dependent services.

[yaml]

1services: 
2  db: 
3    image: postgres:16-alpine
4    healthcheck: 
5      test: ["CMD-SHELL", "pg_isready -U postgres"]
6      interval: 10s      # Check every 10 seconds
7      timeout: 5s        # Wait max 5s for response
8      retries: 5         # Fail after 5 consecutive failures
9      start_period: 30s  # Grace period for slow startups
10    environment: 
11      POSTGRES_PASSWORD: ${DB_PASSWORD}

start_period gives slow-starting services time to initialize before healthchecks count as failures.

03Using condition: service_healthy

The magic happens when you combine healthchecks with depends_on conditions. Instead of just starting in order, Docker will wait until the dependency is actually healthy before starting the dependent service.

[yaml]

1services: 
2  app: 
3    image: myapp:latest
4    depends_on: 
5      db: 
6        condition: service_healthy    # Wait for healthy
7      redis: 
8        condition: service_started    # Just wait for start
9      migrations: 
10        condition: service_completed_successfully  # Wait for exit 0
11
12  db: 
13    image: postgres:16-alpine
14    healthcheck: 
15      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER}"]
16      interval: 5s
17      timeout: 5s
18      retries: 5
19
20  redis: 
21    image: redis:alpine
22    healthcheck: 
23      test: ["CMD", "redis-cli", "ping"]
24      interval: 5s
25      timeout: 3s
26      retries: 3
27
28  migrations: 
29    image: myapp:latest
30    command: python manage.py migrate
31    depends_on: 
32      db: 
33        condition: service_healthy

04Healthcheck Patterns for Common Services

Different services need different healthcheck strategies. Here are battle-tested patterns for the most common self-hosted services.

[yaml]

1# PostgreSQL
2healthcheck: 
3  test: ["CMD-SHELL", "pg_isready -U postgres"]
4  interval: 10s
5  timeout: 5s
6  retries: 5
7
8# MySQL/MariaDB
9healthcheck: 
10  test: ["CMD", "healthcheck.sh", "--connect", "--innodb_initialized"]
11  interval: 10s
12  timeout: 5s
13  retries: 5
14
15# Redis
16healthcheck: 
17  test: ["CMD", "redis-cli", "ping"]
18  interval: 5s
19  timeout: 3s
20  retries: 3
21
22# MongoDB
23healthcheck: 
24  test: ["CMD", "mongosh", "--eval", "db.adminCommand('ping')"]
25  interval: 10s
26  timeout: 5s
27  retries: 5
28
29# HTTP services (generic)
30healthcheck: 
31  test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
32  interval: 30s
33  timeout: 10s
34  retries: 3
35
36# Nginx
37healthcheck: 
38  test: ["CMD", "curl", "-f", "http://localhost/nginx-health"]
39  interval: 30s
40  timeout: 5s
41  retries: 3

For HTTP healthchecks, create a lightweight /health endpoint that checks database connectivity and returns 200 OK.

05Complex Startup Sequences

Real applications often have multi-stage startup: database first, then migrations, then cache warmup, finally the app. Model this as a dependency chain with appropriate conditions.

[yaml]

1services: 
2  # Stage 1: Database
3  postgres: 
4    image: postgres:16-alpine
5    healthcheck: 
6      test: ["CMD-SHELL", "pg_isready"]
7      interval: 5s
8      timeout: 5s
9      retries: 5
10
11  # Stage 2: Run migrations (one-time task)
12  migrations: 
13    image: myapp:latest
14    command: ["./manage.py", "migrate", "--noinput"]
15    depends_on: 
16      postgres: 
17        condition: service_healthy
18    restart: "no"  # Don't restart after success
19
20  # Stage 3: Seed data (optional, one-time)
21  seed: 
22    image: myapp:latest
23    command: ["./manage.py", "loaddata", "initial_data.json"]
24    depends_on: 
25      migrations: 
26        condition: service_completed_successfully
27    restart: "no"
28
29  # Stage 4: Application
30  app: 
31    image: myapp:latest
32    command: ["gunicorn", "app:application"]
33    depends_on: 
34      postgres: 
35        condition: service_healthy
36      migrations: 
37        condition: service_completed_successfully
38    healthcheck: 
39      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
40      interval: 30s
41      timeout: 10s
42      retries: 3

service_completed_successfully only works for containers that exit. Don't use it for long-running services.

06Debugging Healthcheck Issues

When healthchecks fail mysteriously, use these commands to diagnose the problem.

[bash]

1# Check current health status
2docker inspect --format='{{.State.Health.Status}}' container_name
3
4# View healthcheck logs (last 5 checks)
5docker inspect --format='{{json .State.Health}}' container_name | jq
6
7# Run the healthcheck manually
8docker exec container_name pg_isready -U postgres
9
10# Watch health status in real-time
11watch -n 2 "docker ps --format 'table {{.Names}}\t{{.Status}}'"
12
13# Check why a container is unhealthy
14docker inspect container_name | jq '.[0].State.Health.Log'

If a healthcheck works manually but fails in Docker, check for missing tools in the container or PATH issues.

07Healthcheck Best Practices

Follow these guidelines to write reliable healthchecks that improve your stack's resilience: **1. Keep healthchecks fast** - They run frequently; slow checks waste resources **2. Check what matters** - A database healthcheck should verify it can accept connections, not just that the process is running **3. Use start_period** - Give slow services (Elasticsearch, Java apps) time to initialize **4. Don't healthcheck everything** - Simple, stateless containers often don't need them **5. Match intervals to service criticality** - Critical services: 5-10s, less critical: 30-60s