docker.recipes

Apache Airflow with Celery

advanced

Apache Airflow workflow orchestration with Celery executor and Redis broker.

Overview

Apache Airflow is a platform for programmatically authoring, scheduling, and monitoring workflows using Python DAG definitions. Originally developed at Airbnb and later open-sourced, Airflow has become the industry standard for orchestrating complex data pipelines and ETL processes. The Celery executor enables distributed task execution across multiple worker nodes, making it suitable for high-throughput production environments where tasks need to be processed in parallel across different machines or containers. This stack combines Airflow's web server, scheduler, and Celery workers with Redis as the message broker and PostgreSQL as the metadata database. Redis handles task queuing and result caching between the scheduler and workers, while PostgreSQL stores DAG metadata, task history, and execution logs. The Celery executor allows tasks to be distributed across multiple worker processes, enabling horizontal scaling and fault tolerance that the default SequentialExecutor cannot provide. Data engineers and DevOps teams running complex ETL pipelines, machine learning workflows, or batch processing jobs benefit from this configuration. The distributed architecture supports organizations processing hundreds of daily workflows with varying resource requirements. Companies migrating from cron jobs or legacy schedulers find this setup provides better visibility, dependency management, and failure recovery compared to traditional approaches.

Key Features

  • Python DAG definition with rich scheduling expressions and cron syntax
  • Distributed task execution using Celery workers across multiple containers
  • Web-based UI for monitoring workflow status, logs, and task dependencies
  • Redis message broker for fast task queuing and result backend caching
  • PostgreSQL metadata database with ACID compliance for workflow state
  • Flower monitoring interface for real-time Celery worker and queue visibility
  • Task retry logic with exponential backoff and SLA monitoring
  • Dynamic pipeline generation and templating with Jinja2 support

Common Use Cases

  • 1Daily ETL pipelines processing data warehouse loads from multiple sources
  • 2Machine learning model training workflows with GPU resource scheduling
  • 3Data quality monitoring pipelines with automated alerting and remediation
  • 4Multi-cloud data synchronization between AWS S3, Google Cloud, and Azure
  • 5Financial reporting automation with regulatory compliance requirements
  • 6IoT sensor data processing with real-time aggregation and anomaly detection
  • 7Marketing campaign automation with customer segmentation and email triggers

Prerequisites

  • 8GB+ RAM recommended for production workloads (4GB minimum for development)
  • Docker and Docker Compose with container orchestration knowledge
  • Python programming experience for writing DAGs and custom operators
  • Understanding of Celery distributed task queues and worker management
  • PostgreSQL administration skills for database maintenance and backups
  • Network access to ports 8080 (Airflow UI) and 5555 (Flower monitoring)

For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

docker-compose.yml

docker-compose.yml
1services:
2 postgres:
3 image: postgres:16-alpine
4 container_name: airflow-db
5 environment:
6 - POSTGRES_USER=airflow
7 - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
8 - POSTGRES_DB=airflow
9 volumes:
10 - postgres_data:/var/lib/postgresql/data
11 healthcheck:
12 test: ["CMD-SHELL", "pg_isready -U airflow"]
13 interval: 10s
14 timeout: 5s
15 retries: 5
16 networks:
17 - airflow-network
18
19 redis:
20 image: redis:7-alpine
21 container_name: airflow-redis
22 healthcheck:
23 test: ["CMD", "redis-cli", "ping"]
24 interval: 10s
25 timeout: 5s
26 retries: 5
27 networks:
28 - airflow-network
29
30 airflow-init:
31 image: apache/airflow:2.8.0
32 container_name: airflow-init
33 entrypoint: /bin/bash
34 command:
35 - -c
36 - |
37 airflow db init
38 airflow users create --username admin --password ${AIRFLOW_ADMIN_PASSWORD} --firstname Admin --lastname User --role Admin --email admin@example.com
39 environment:
40 - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
41 - AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
42 - AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/0
43 - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
44 depends_on:
45 postgres:
46 condition: service_healthy
47 redis:
48 condition: service_healthy
49 networks:
50 - airflow-network
51
52 airflow-webserver:
53 image: apache/airflow:2.8.0
54 container_name: airflow-webserver
55 command: webserver
56 environment:
57 - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
58 - AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
59 - AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/0
60 - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
61 - AIRFLOW__WEBSERVER__SECRET_KEY=${AIRFLOW_SECRET_KEY}
62 ports:
63 - "8080:8080"
64 volumes:
65 - ./dags:/opt/airflow/dags
66 - ./logs:/opt/airflow/logs
67 - ./plugins:/opt/airflow/plugins
68 depends_on:
69 airflow-init:
70 condition: service_completed_successfully
71 networks:
72 - airflow-network
73
74 airflow-scheduler:
75 image: apache/airflow:2.8.0
76 container_name: airflow-scheduler
77 command: scheduler
78 environment:
79 - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
80 - AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
81 - AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/0
82 - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
83 volumes:
84 - ./dags:/opt/airflow/dags
85 - ./logs:/opt/airflow/logs
86 - ./plugins:/opt/airflow/plugins
87 depends_on:
88 airflow-init:
89 condition: service_completed_successfully
90 networks:
91 - airflow-network
92
93 airflow-worker:
94 image: apache/airflow:2.8.0
95 container_name: airflow-worker
96 command: celery worker
97 environment:
98 - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
99 - AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
100 - AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/0
101 - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
102 volumes:
103 - ./dags:/opt/airflow/dags
104 - ./logs:/opt/airflow/logs
105 - ./plugins:/opt/airflow/plugins
106 depends_on:
107 airflow-init:
108 condition: service_completed_successfully
109 networks:
110 - airflow-network
111
112 flower:
113 image: apache/airflow:2.8.0
114 container_name: airflow-flower
115 command: celery flower
116 environment:
117 - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
118 - AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
119 - AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/0
120 ports:
121 - "5555:5555"
122 depends_on:
123 airflow-init:
124 condition: service_completed_successfully
125 networks:
126 - airflow-network
127
128volumes:
129 postgres_data:
130
131networks:
132 airflow-network:
133 driver: bridge

.env Template

.env
1# Apache Airflow
2POSTGRES_PASSWORD=airflow_db_password
3AIRFLOW_ADMIN_PASSWORD=admin123
4AIRFLOW_SECRET_KEY=your-secret-key-here

Usage Notes

  1. 1Airflow UI at http://localhost:8080
  2. 2Flower (Celery monitor) at http://localhost:5555
  3. 3Place DAGs in ./dags directory
  4. 4Scale workers with docker compose scale
  5. 5Create dags, logs, plugins directories first

Individual Services(7 services)

Copy individual services to mix and match with your existing compose files.

postgres
postgres:
  image: postgres:16-alpine
  container_name: airflow-db
  environment:
    - POSTGRES_USER=airflow
    - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
    - POSTGRES_DB=airflow
  volumes:
    - postgres_data:/var/lib/postgresql/data
  healthcheck:
    test:
      - CMD-SHELL
      - pg_isready -U airflow
    interval: 10s
    timeout: 5s
    retries: 5
  networks:
    - airflow-network
redis
redis:
  image: redis:7-alpine
  container_name: airflow-redis
  healthcheck:
    test:
      - CMD
      - redis-cli
      - ping
    interval: 10s
    timeout: 5s
    retries: 5
  networks:
    - airflow-network
airflow-init
airflow-init:
  image: apache/airflow:2.8.0
  container_name: airflow-init
  entrypoint: /bin/bash
  command:
    - "-c"
    - |
      airflow db init
      airflow users create --username admin --password ${AIRFLOW_ADMIN_PASSWORD} --firstname Admin --lastname User --role Admin --email admin@example.com
  environment:
    - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
    - AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
    - AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/0
    - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
  depends_on:
    postgres:
      condition: service_healthy
    redis:
      condition: service_healthy
  networks:
    - airflow-network
airflow-webserver
airflow-webserver:
  image: apache/airflow:2.8.0
  container_name: airflow-webserver
  command: webserver
  environment:
    - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
    - AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
    - AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/0
    - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
    - AIRFLOW__WEBSERVER__SECRET_KEY=${AIRFLOW_SECRET_KEY}
  ports:
    - "8080:8080"
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
  depends_on:
    airflow-init:
      condition: service_completed_successfully
  networks:
    - airflow-network
airflow-scheduler
airflow-scheduler:
  image: apache/airflow:2.8.0
  container_name: airflow-scheduler
  command: scheduler
  environment:
    - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
    - AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
    - AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/0
    - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
  depends_on:
    airflow-init:
      condition: service_completed_successfully
  networks:
    - airflow-network
airflow-worker
airflow-worker:
  image: apache/airflow:2.8.0
  container_name: airflow-worker
  command: celery worker
  environment:
    - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
    - AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
    - AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/0
    - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
  depends_on:
    airflow-init:
      condition: service_completed_successfully
  networks:
    - airflow-network
flower
flower:
  image: apache/airflow:2.8.0
  container_name: airflow-flower
  command: celery flower
  environment:
    - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
    - AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
    - AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/0
  ports:
    - "5555:5555"
  depends_on:
    airflow-init:
      condition: service_completed_successfully
  networks:
    - airflow-network

Quick Start

terminal
1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4 postgres:
5 image: postgres:16-alpine
6 container_name: airflow-db
7 environment:
8 - POSTGRES_USER=airflow
9 - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
10 - POSTGRES_DB=airflow
11 volumes:
12 - postgres_data:/var/lib/postgresql/data
13 healthcheck:
14 test: ["CMD-SHELL", "pg_isready -U airflow"]
15 interval: 10s
16 timeout: 5s
17 retries: 5
18 networks:
19 - airflow-network
20
21 redis:
22 image: redis:7-alpine
23 container_name: airflow-redis
24 healthcheck:
25 test: ["CMD", "redis-cli", "ping"]
26 interval: 10s
27 timeout: 5s
28 retries: 5
29 networks:
30 - airflow-network
31
32 airflow-init:
33 image: apache/airflow:2.8.0
34 container_name: airflow-init
35 entrypoint: /bin/bash
36 command:
37 - -c
38 - |
39 airflow db init
40 airflow users create --username admin --password ${AIRFLOW_ADMIN_PASSWORD} --firstname Admin --lastname User --role Admin --email admin@example.com
41 environment:
42 - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
43 - AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
44 - AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/0
45 - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
46 depends_on:
47 postgres:
48 condition: service_healthy
49 redis:
50 condition: service_healthy
51 networks:
52 - airflow-network
53
54 airflow-webserver:
55 image: apache/airflow:2.8.0
56 container_name: airflow-webserver
57 command: webserver
58 environment:
59 - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
60 - AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
61 - AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/0
62 - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
63 - AIRFLOW__WEBSERVER__SECRET_KEY=${AIRFLOW_SECRET_KEY}
64 ports:
65 - "8080:8080"
66 volumes:
67 - ./dags:/opt/airflow/dags
68 - ./logs:/opt/airflow/logs
69 - ./plugins:/opt/airflow/plugins
70 depends_on:
71 airflow-init:
72 condition: service_completed_successfully
73 networks:
74 - airflow-network
75
76 airflow-scheduler:
77 image: apache/airflow:2.8.0
78 container_name: airflow-scheduler
79 command: scheduler
80 environment:
81 - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
82 - AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
83 - AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/0
84 - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
85 volumes:
86 - ./dags:/opt/airflow/dags
87 - ./logs:/opt/airflow/logs
88 - ./plugins:/opt/airflow/plugins
89 depends_on:
90 airflow-init:
91 condition: service_completed_successfully
92 networks:
93 - airflow-network
94
95 airflow-worker:
96 image: apache/airflow:2.8.0
97 container_name: airflow-worker
98 command: celery worker
99 environment:
100 - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
101 - AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
102 - AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/0
103 - AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
104 volumes:
105 - ./dags:/opt/airflow/dags
106 - ./logs:/opt/airflow/logs
107 - ./plugins:/opt/airflow/plugins
108 depends_on:
109 airflow-init:
110 condition: service_completed_successfully
111 networks:
112 - airflow-network
113
114 flower:
115 image: apache/airflow:2.8.0
116 container_name: airflow-flower
117 command: celery flower
118 environment:
119 - AIRFLOW__CORE__EXECUTOR=CeleryExecutor
120 - AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
121 - AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/0
122 ports:
123 - "5555:5555"
124 depends_on:
125 airflow-init:
126 condition: service_completed_successfully
127 networks:
128 - airflow-network
129
130volumes:
131 postgres_data:
132
133networks:
134 airflow-network:
135 driver: bridge
136EOF
137
138# 2. Create the .env file
139cat > .env << 'EOF'
140# Apache Airflow
141POSTGRES_PASSWORD=airflow_db_password
142AIRFLOW_ADMIN_PASSWORD=admin123
143AIRFLOW_SECRET_KEY=your-secret-key-here
144EOF
145
146# 3. Start the services
147docker compose up -d
148
149# 4. View logs
150docker compose logs -f

One-Liner

Run this command to download and set up the recipe in one step:

terminal
1curl -fsSL https://docker.recipes/api/recipes/airflow-celery-redis/run | bash

Troubleshooting

  • ImportError when loading DAGs: Ensure custom Python packages are installed in all Airflow containers using requirements.txt or custom Docker images
  • Tasks stuck in queued state: Check Redis connectivity and verify Celery workers are running with docker compose logs airflow-worker
  • Scheduler not picking up new DAGs: Restart airflow-scheduler container and verify DAG files have correct Python syntax and are in /opt/airflow/dags
  • Database connection errors: Verify PostgreSQL container is healthy and POSTGRES_PASSWORD environment variable matches across all services
  • Worker memory issues: Increase Docker container memory limits or reduce AIRFLOW__CELERY__WORKER_CONCURRENCY to prevent OOM kills
  • Web UI 403 Forbidden errors: Check AIRFLOW__WEBSERVER__SECRET_KEY is set and consistent across restarts, regenerate if corrupted

Community Notes

Loading...
Loading notes...

Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

Components

airflow-webserverairflow-schedulerairflow-workerpostgresredis

Tags

#airflow#workflow#orchestration#celery#etl#scheduling

Category

DevOps & CI/CD
Ad Space