docker.recipes

Apache Airflow Data Pipeline

advanced

Apache Airflow with Celery executors, PostgreSQL, Redis, and Flower monitoring

Overview

Apache Airflow is an open-source platform for programmatically authoring, scheduling, and monitoring workflows, originally developed by Airbnb in 2014 to solve complex data pipeline orchestration challenges. It allows data engineers to define workflows as code using Python DAGs (Directed Acyclic Graphs), making it the go-to solution for ETL processes, machine learning pipelines, and task automation across thousands of organizations worldwide. The Celery executor enables horizontal scaling by distributing task execution across multiple worker nodes, while PostgreSQL serves as the robust metadata database storing DAG runs, task instances, and connection details. Redis acts as the message broker for Celery, queuing tasks and enabling real-time communication between the scheduler and workers, while Flower provides a web-based monitoring interface for tracking worker performance and task distribution. This production-grade stack solves the fundamental challenge of reliable, scalable workflow orchestration by combining Airflow's rich scheduling capabilities with Celery's distributed task execution, PostgreSQL's ACID compliance for critical metadata, and Redis's sub-millisecond message passing. Data engineering teams, DevOps professionals, and organizations running complex ETL pipelines will benefit from this configuration's ability to handle thousands of concurrent tasks while maintaining visibility into workflow execution, task dependencies, and system health through comprehensive monitoring and alerting capabilities.

Key Features

  • Python-based DAG definition with rich operator library for databases, cloud services, and APIs
  • CeleryExecutor for horizontal scaling with dynamic worker allocation across multiple nodes
  • PostgreSQL metadata database with ACID compliance for reliable DAG run and task state tracking
  • Redis message broker enabling sub-millisecond task queuing and worker communication
  • Flower monitoring dashboard for real-time Celery worker metrics and task distribution analytics
  • Triggerer service for efficient deferrable operator handling without blocking worker slots
  • Web UI with Gantt charts, task logs, and DAG visualization for comprehensive workflow monitoring
  • Connection and variable management with encryption for secure credential storage

Common Use Cases

  • 1ETL pipelines processing data from multiple sources into data warehouses with complex dependencies
  • 2Machine learning workflows orchestrating model training, validation, and deployment across environments
  • 3Data quality monitoring with automated checks, alerts, and remediation workflows
  • 4Multi-cloud data synchronization between AWS S3, Google Cloud Storage, and Azure Blob Storage
  • 5Financial reporting automation with regulatory compliance checks and audit trails
  • 6IoT data processing pipelines handling sensor data ingestion, transformation, and analytics
  • 7Marketing campaign automation with customer segmentation, email triggers, and performance tracking

Prerequisites

  • Docker Engine 20.10+ and Docker Compose 2.0+ with at least 6GB available RAM for all services
  • Python knowledge for writing DAGs and understanding Airflow's operator patterns
  • Basic PostgreSQL administration skills for database maintenance and query optimization
  • Understanding of Celery distributed task concepts and worker scaling strategies
  • Network ports 8080 (Airflow UI) and 5555 (Flower) available and not blocked by firewalls
  • Environment variables configured: POSTGRES_PASSWORD, FERNET_KEY, and ADMIN_PASSWORD

For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

docker-compose.yml

docker-compose.yml
1x-airflow-common: &airflow-common
2 image: apache/airflow:2.8.0
3 environment: &airflow-common-env
4 AIRFLOW__CORE__EXECUTOR: CeleryExecutor
5 AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
6 AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
7 AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
8 AIRFLOW__CORE__FERNET_KEY: ${FERNET_KEY}
9 AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
10 AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
11 AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth'
12 volumes:
13 - ./dags:/opt/airflow/dags
14 - ./logs:/opt/airflow/logs
15 - ./plugins:/opt/airflow/plugins
16 depends_on:
17 - redis
18 - postgres
19
20services:
21 postgres:
22 image: postgres:15-alpine
23 container_name: airflow-postgres
24 restart: unless-stopped
25 environment:
26 - POSTGRES_USER=airflow
27 - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
28 - POSTGRES_DB=airflow
29 volumes:
30 - postgres_data:/var/lib/postgresql/data
31
32 redis:
33 image: redis:7-alpine
34 container_name: airflow-redis
35 restart: unless-stopped
36
37 airflow-webserver:
38 <<: *airflow-common
39 container_name: airflow-webserver
40 command: webserver
41 ports:
42 - "${AIRFLOW_PORT:-8080}:8080"
43 restart: unless-stopped
44
45 airflow-scheduler:
46 <<: *airflow-common
47 container_name: airflow-scheduler
48 command: scheduler
49 restart: unless-stopped
50
51 airflow-worker:
52 <<: *airflow-common
53 container_name: airflow-worker
54 command: celery worker
55 restart: unless-stopped
56
57 airflow-triggerer:
58 <<: *airflow-common
59 container_name: airflow-triggerer
60 command: triggerer
61 restart: unless-stopped
62
63 airflow-init:
64 <<: *airflow-common
65 container_name: airflow-init
66 entrypoint: /bin/bash
67 command:
68 - -c
69 - |
70 airflow db init
71 airflow users create --username admin --firstname Admin --lastname User --role Admin --email admin@example.com --password ${ADMIN_PASSWORD}
72 profiles:
73 - init
74
75 flower:
76 <<: *airflow-common
77 container_name: airflow-flower
78 command: celery flower
79 ports:
80 - "${FLOWER_PORT:-5555}:5555"
81 restart: unless-stopped
82
83volumes:
84 postgres_data:

.env Template

.env
1# Apache Airflow Stack
2AIRFLOW_PORT=8080
3FLOWER_PORT=5555
4
5# Database
6POSTGRES_PASSWORD=airflow_password
7
8# Fernet key (generate with: python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())")
9FERNET_KEY=your-fernet-key-here
10
11# Admin user
12ADMIN_PASSWORD=admin

Usage Notes

  1. 1Initialize first: docker compose --profile init up airflow-init
  2. 2Airflow UI at http://localhost:8080 (admin/admin)
  3. 3Flower (worker monitoring) at http://localhost:5555
  4. 4Place DAG files in ./dags directory
  5. 5Scale workers: docker compose up -d --scale airflow-worker=3
  6. 6Check task logs in ./logs directory

Individual Services(8 services)

Copy individual services to mix and match with your existing compose files.

postgres
postgres:
  image: postgres:15-alpine
  container_name: airflow-postgres
  restart: unless-stopped
  environment:
    - POSTGRES_USER=airflow
    - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
    - POSTGRES_DB=airflow
  volumes:
    - postgres_data:/var/lib/postgresql/data
redis
redis:
  image: redis:7-alpine
  container_name: airflow-redis
  restart: unless-stopped
airflow-webserver
airflow-webserver:
  image: apache/airflow:2.8.0
  environment:
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
    AIRFLOW__CORE__FERNET_KEY: ${FERNET_KEY}
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: "true"
    AIRFLOW__CORE__LOAD_EXAMPLES: "false"
    AIRFLOW__API__AUTH_BACKENDS: airflow.api.auth.backend.basic_auth
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
  depends_on:
    - redis
    - postgres
  container_name: airflow-webserver
  command: webserver
  ports:
    - ${AIRFLOW_PORT:-8080}:8080
  restart: unless-stopped
airflow-scheduler
airflow-scheduler:
  image: apache/airflow:2.8.0
  environment:
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
    AIRFLOW__CORE__FERNET_KEY: ${FERNET_KEY}
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: "true"
    AIRFLOW__CORE__LOAD_EXAMPLES: "false"
    AIRFLOW__API__AUTH_BACKENDS: airflow.api.auth.backend.basic_auth
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
  depends_on:
    - redis
    - postgres
  container_name: airflow-scheduler
  command: scheduler
  restart: unless-stopped
airflow-worker
airflow-worker:
  image: apache/airflow:2.8.0
  environment:
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
    AIRFLOW__CORE__FERNET_KEY: ${FERNET_KEY}
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: "true"
    AIRFLOW__CORE__LOAD_EXAMPLES: "false"
    AIRFLOW__API__AUTH_BACKENDS: airflow.api.auth.backend.basic_auth
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
  depends_on:
    - redis
    - postgres
  container_name: airflow-worker
  command: celery worker
  restart: unless-stopped
airflow-triggerer
airflow-triggerer:
  image: apache/airflow:2.8.0
  environment:
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
    AIRFLOW__CORE__FERNET_KEY: ${FERNET_KEY}
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: "true"
    AIRFLOW__CORE__LOAD_EXAMPLES: "false"
    AIRFLOW__API__AUTH_BACKENDS: airflow.api.auth.backend.basic_auth
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
  depends_on:
    - redis
    - postgres
  container_name: airflow-triggerer
  command: triggerer
  restart: unless-stopped
airflow-init
airflow-init:
  image: apache/airflow:2.8.0
  environment:
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
    AIRFLOW__CORE__FERNET_KEY: ${FERNET_KEY}
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: "true"
    AIRFLOW__CORE__LOAD_EXAMPLES: "false"
    AIRFLOW__API__AUTH_BACKENDS: airflow.api.auth.backend.basic_auth
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
  depends_on:
    - redis
    - postgres
  container_name: airflow-init
  entrypoint: /bin/bash
  command:
    - "-c"
    - |
      airflow db init
      airflow users create --username admin --firstname Admin --lastname User --role Admin --email admin@example.com --password ${ADMIN_PASSWORD}
  profiles:
    - init
flower
flower:
  image: apache/airflow:2.8.0
  environment:
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
    AIRFLOW__CORE__FERNET_KEY: ${FERNET_KEY}
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: "true"
    AIRFLOW__CORE__LOAD_EXAMPLES: "false"
    AIRFLOW__API__AUTH_BACKENDS: airflow.api.auth.backend.basic_auth
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
    - ./plugins:/opt/airflow/plugins
  depends_on:
    - redis
    - postgres
  container_name: airflow-flower
  command: celery flower
  ports:
    - ${FLOWER_PORT:-5555}:5555
  restart: unless-stopped

Quick Start

terminal
1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3x-airflow-common: &airflow-common
4 image: apache/airflow:2.8.0
5 environment: &airflow-common-env
6 AIRFLOW__CORE__EXECUTOR: CeleryExecutor
7 AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:${POSTGRES_PASSWORD}@postgres/airflow
8 AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:${POSTGRES_PASSWORD}@postgres/airflow
9 AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
10 AIRFLOW__CORE__FERNET_KEY: ${FERNET_KEY}
11 AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
12 AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
13 AIRFLOW__API__AUTH_BACKENDS: 'airflow.api.auth.backend.basic_auth'
14 volumes:
15 - ./dags:/opt/airflow/dags
16 - ./logs:/opt/airflow/logs
17 - ./plugins:/opt/airflow/plugins
18 depends_on:
19 - redis
20 - postgres
21
22services:
23 postgres:
24 image: postgres:15-alpine
25 container_name: airflow-postgres
26 restart: unless-stopped
27 environment:
28 - POSTGRES_USER=airflow
29 - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
30 - POSTGRES_DB=airflow
31 volumes:
32 - postgres_data:/var/lib/postgresql/data
33
34 redis:
35 image: redis:7-alpine
36 container_name: airflow-redis
37 restart: unless-stopped
38
39 airflow-webserver:
40 <<: *airflow-common
41 container_name: airflow-webserver
42 command: webserver
43 ports:
44 - "${AIRFLOW_PORT:-8080}:8080"
45 restart: unless-stopped
46
47 airflow-scheduler:
48 <<: *airflow-common
49 container_name: airflow-scheduler
50 command: scheduler
51 restart: unless-stopped
52
53 airflow-worker:
54 <<: *airflow-common
55 container_name: airflow-worker
56 command: celery worker
57 restart: unless-stopped
58
59 airflow-triggerer:
60 <<: *airflow-common
61 container_name: airflow-triggerer
62 command: triggerer
63 restart: unless-stopped
64
65 airflow-init:
66 <<: *airflow-common
67 container_name: airflow-init
68 entrypoint: /bin/bash
69 command:
70 - -c
71 - |
72 airflow db init
73 airflow users create --username admin --firstname Admin --lastname User --role Admin --email admin@example.com --password ${ADMIN_PASSWORD}
74 profiles:
75 - init
76
77 flower:
78 <<: *airflow-common
79 container_name: airflow-flower
80 command: celery flower
81 ports:
82 - "${FLOWER_PORT:-5555}:5555"
83 restart: unless-stopped
84
85volumes:
86 postgres_data:
87EOF
88
89# 2. Create the .env file
90cat > .env << 'EOF'
91# Apache Airflow Stack
92AIRFLOW_PORT=8080
93FLOWER_PORT=5555
94
95# Database
96POSTGRES_PASSWORD=airflow_password
97
98# Fernet key (generate with: python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())")
99FERNET_KEY=your-fernet-key-here
100
101# Admin user
102ADMIN_PASSWORD=admin
103EOF
104
105# 3. Start the services
106docker compose up -d
107
108# 4. View logs
109docker compose logs -f

One-Liner

Run this command to download and set up the recipe in one step:

terminal
1curl -fsSL https://docker.recipes/api/recipes/airflow-data-pipeline/run | bash

Troubleshooting

  • ImportError in DAGs folder: Ensure Python dependencies are installed in Airflow image or mount virtual environment
  • Celery workers not picking up tasks: Check Redis connectivity and verify AIRFLOW__CELERY__BROKER_URL environment variable
  • Scheduler not creating task instances: Verify DAG file syntax and check scheduler logs for parsing errors
  • Database connection timeout: Increase PostgreSQL max_connections and check AIRFLOW__DATABASE__SQL_ALCHEMY_CONN format
  • Flower shows no workers: Ensure airflow-worker container is running and Redis broker URL matches across all services
  • Tasks stuck in queued state: Scale up workers with --scale airflow-worker=N or check worker resource limits

Community Notes

Loading...
Loading notes...

Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

Ad Space