docker.recipes

Apache Airflow

advanced

Platform to programmatically author and schedule workflows.

Overview

Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. Originally created by Airbnb in 2014, Airflow allows data engineers to define workflows as code using Python, providing programmatic control over complex data pipelines with rich dependency management, retry logic, and monitoring capabilities. The platform has become the industry standard for orchestrating ETL processes, ML pipelines, and data engineering workflows across organizations from startups to Fortune 500 companies. This Docker stack combines Airflow with PostgreSQL as the metadata database and Redis as the message broker for Celery-based distributed task execution. PostgreSQL stores all workflow metadata, task states, and execution history, while Redis handles the message queuing between the scheduler and worker nodes. The CeleryExecutor configuration enables horizontal scaling of task execution across multiple worker containers, making this setup suitable for production workloads that require parallel processing and fault tolerance. Data engineers, ML engineers, and DevOps teams building automated data pipelines will find this stack particularly valuable. The combination provides enterprise-grade workflow orchestration with the ability to handle complex dependencies, manage task failures gracefully, and scale execution capacity based on workload demands. Unlike simpler cron-based solutions, this Airflow deployment offers rich monitoring, alerting, and the ability to handle dynamic workflows that adapt based on data conditions or external triggers.

Key Features

  • Python DAG (Directed Acyclic Graph) definition for workflow authoring with rich dependency management
  • CeleryExecutor for distributed task execution across multiple worker containers
  • PostgreSQL metadata database storing workflow definitions, task states, and execution history
  • Redis message broker enabling reliable task queuing and worker communication
  • Web UI at port 8080 for workflow monitoring, task debugging, and manual trigger management
  • Automatic DAG discovery from mounted ./dags directory with hot-reloading capabilities
  • Comprehensive logging system with persistent log storage across container restarts
  • Built-in retry logic, SLA monitoring, and email alerting for production workflow management

Common Use Cases

  • 1ETL pipeline orchestration for data warehouses with complex data source dependencies
  • 2ML model training workflows with feature engineering, training, and deployment stages
  • 3Daily batch processing of business reports with multiple data transformation steps
  • 4Data lake ingestion pipelines handling various file formats and validation rules
  • 5API data synchronization workflows pulling from multiple external services
  • 6Database maintenance tasks including backups, cleanup, and data archival
  • 7Multi-cloud data replication workflows with failure handling and monitoring

Prerequisites

  • Minimum 4GB RAM (1GB for Airflow components, 1GB for PostgreSQL, 512MB for Redis)
  • Port 8080 available for Airflow web interface access
  • Docker Compose v3.8+ with volume and network support
  • Python knowledge for DAG development and workflow definition
  • Understanding of ETL concepts and data pipeline architecture
  • PostgreSQL connection string format for database configuration

For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

docker-compose.yml

docker-compose.yml
1services:
2 airflow-webserver:
3 image: apache/airflow:latest
4 container_name: airflow-webserver
5 command: webserver
6 environment:
7 AIRFLOW__CORE__EXECUTOR: CeleryExecutor
8 AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}
9 AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/0
10 AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}
11 volumes:
12 - ./dags:/opt/airflow/dags
13 - airflow_logs:/opt/airflow/logs
14 ports:
15 - "8080:8080"
16 depends_on:
17 - postgres
18 - redis
19 networks:
20 - airflow
21
22 airflow-scheduler:
23 image: apache/airflow:latest
24 container_name: airflow-scheduler
25 command: scheduler
26 environment:
27 AIRFLOW__CORE__EXECUTOR: CeleryExecutor
28 AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}
29 AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/0
30 volumes:
31 - ./dags:/opt/airflow/dags
32 - airflow_logs:/opt/airflow/logs
33 depends_on:
34 - postgres
35 - redis
36 networks:
37 - airflow
38
39 airflow-worker:
40 image: apache/airflow:latest
41 container_name: airflow-worker
42 command: celery worker
43 environment:
44 AIRFLOW__CORE__EXECUTOR: CeleryExecutor
45 AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}
46 AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/0
47 volumes:
48 - ./dags:/opt/airflow/dags
49 - airflow_logs:/opt/airflow/logs
50 depends_on:
51 - postgres
52 - redis
53 networks:
54 - airflow
55
56 postgres:
57 image: postgres:16-alpine
58 container_name: airflow-postgres
59 environment:
60 POSTGRES_DB: ${DB_NAME}
61 POSTGRES_USER: ${DB_USER}
62 POSTGRES_PASSWORD: ${DB_PASSWORD}
63 volumes:
64 - postgres_data:/var/lib/postgresql/data
65 networks:
66 - airflow
67
68 redis:
69 image: redis:alpine
70 container_name: airflow-redis
71 networks:
72 - airflow
73
74volumes:
75 airflow_logs:
76 postgres_data:
77
78networks:
79 airflow:
80 driver: bridge

.env Template

.env
1DB_NAME=airflow
2DB_USER=airflow
3DB_PASSWORD=changeme

Usage Notes

  1. 1Docs: https://airflow.apache.org/docs/
  2. 2Access at http://localhost:8080 - default login: airflow / airflow
  3. 3Initialize DB first: docker exec airflow-webserver airflow db init
  4. 4Place Python DAG files in ./dags folder - auto-detected
  5. 5Create admin: airflow users create --role Admin --username admin
  6. 6CeleryExecutor scales workers horizontally with replicas

Individual Services(5 services)

Copy individual services to mix and match with your existing compose files.

airflow-webserver
airflow-webserver:
  image: apache/airflow:latest
  container_name: airflow-webserver
  command: webserver
  environment:
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}
    AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/0
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}
  volumes:
    - ./dags:/opt/airflow/dags
    - airflow_logs:/opt/airflow/logs
  ports:
    - "8080:8080"
  depends_on:
    - postgres
    - redis
  networks:
    - airflow
airflow-scheduler
airflow-scheduler:
  image: apache/airflow:latest
  container_name: airflow-scheduler
  command: scheduler
  environment:
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}
    AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/0
  volumes:
    - ./dags:/opt/airflow/dags
    - airflow_logs:/opt/airflow/logs
  depends_on:
    - postgres
    - redis
  networks:
    - airflow
airflow-worker
airflow-worker:
  image: apache/airflow:latest
  container_name: airflow-worker
  command: celery worker
  environment:
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}
    AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/0
  volumes:
    - ./dags:/opt/airflow/dags
    - airflow_logs:/opt/airflow/logs
  depends_on:
    - postgres
    - redis
  networks:
    - airflow
postgres
postgres:
  image: postgres:16-alpine
  container_name: airflow-postgres
  environment:
    POSTGRES_DB: ${DB_NAME}
    POSTGRES_USER: ${DB_USER}
    POSTGRES_PASSWORD: ${DB_PASSWORD}
  volumes:
    - postgres_data:/var/lib/postgresql/data
  networks:
    - airflow
redis
redis:
  image: redis:alpine
  container_name: airflow-redis
  networks:
    - airflow

Quick Start

terminal
1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4 airflow-webserver:
5 image: apache/airflow:latest
6 container_name: airflow-webserver
7 command: webserver
8 environment:
9 AIRFLOW__CORE__EXECUTOR: CeleryExecutor
10 AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}
11 AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/0
12 AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}
13 volumes:
14 - ./dags:/opt/airflow/dags
15 - airflow_logs:/opt/airflow/logs
16 ports:
17 - "8080:8080"
18 depends_on:
19 - postgres
20 - redis
21 networks:
22 - airflow
23
24 airflow-scheduler:
25 image: apache/airflow:latest
26 container_name: airflow-scheduler
27 command: scheduler
28 environment:
29 AIRFLOW__CORE__EXECUTOR: CeleryExecutor
30 AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}
31 AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/0
32 volumes:
33 - ./dags:/opt/airflow/dags
34 - airflow_logs:/opt/airflow/logs
35 depends_on:
36 - postgres
37 - redis
38 networks:
39 - airflow
40
41 airflow-worker:
42 image: apache/airflow:latest
43 container_name: airflow-worker
44 command: celery worker
45 environment:
46 AIRFLOW__CORE__EXECUTOR: CeleryExecutor
47 AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}
48 AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/0
49 volumes:
50 - ./dags:/opt/airflow/dags
51 - airflow_logs:/opt/airflow/logs
52 depends_on:
53 - postgres
54 - redis
55 networks:
56 - airflow
57
58 postgres:
59 image: postgres:16-alpine
60 container_name: airflow-postgres
61 environment:
62 POSTGRES_DB: ${DB_NAME}
63 POSTGRES_USER: ${DB_USER}
64 POSTGRES_PASSWORD: ${DB_PASSWORD}
65 volumes:
66 - postgres_data:/var/lib/postgresql/data
67 networks:
68 - airflow
69
70 redis:
71 image: redis:alpine
72 container_name: airflow-redis
73 networks:
74 - airflow
75
76volumes:
77 airflow_logs:
78 postgres_data:
79
80networks:
81 airflow:
82 driver: bridge
83EOF
84
85# 2. Create the .env file
86cat > .env << 'EOF'
87DB_NAME=airflow
88DB_USER=airflow
89DB_PASSWORD=changeme
90EOF
91
92# 3. Start the services
93docker compose up -d
94
95# 4. View logs
96docker compose logs -f

One-Liner

Run this command to download and set up the recipe in one step:

terminal
1curl -fsSL https://docker.recipes/api/recipes/airflow/run | bash

Troubleshooting

  • Import errors in DAGs: Check Python dependencies in ./dags directory and ensure proper syntax
  • Scheduler not picking up new DAGs: Verify ./dags volume mount and check DAG parsing errors in logs
  • Tasks stuck in queued state: Check Redis connectivity and restart airflow-worker container
  • Database connection failures: Verify PostgreSQL container health and environment variables match
  • Web UI shows 'Airflow not ready': Run 'docker exec airflow-webserver airflow db init' to initialize database
  • Worker tasks timing out: Increase Celery task timeout settings or scale worker replicas

Community Notes

Loading...
Loading notes...

Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

Ad Space