Apache Airflow
Platform to programmatically author and schedule workflows.
Overview
Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. Originally created by Airbnb in 2014, Airflow allows data engineers to define workflows as code using Python, providing programmatic control over complex data pipelines with rich dependency management, retry logic, and monitoring capabilities. The platform has become the industry standard for orchestrating ETL processes, ML pipelines, and data engineering workflows across organizations from startups to Fortune 500 companies.
This Docker stack combines Airflow with PostgreSQL as the metadata database and Redis as the message broker for Celery-based distributed task execution. PostgreSQL stores all workflow metadata, task states, and execution history, while Redis handles the message queuing between the scheduler and worker nodes. The CeleryExecutor configuration enables horizontal scaling of task execution across multiple worker containers, making this setup suitable for production workloads that require parallel processing and fault tolerance.
Data engineers, ML engineers, and DevOps teams building automated data pipelines will find this stack particularly valuable. The combination provides enterprise-grade workflow orchestration with the ability to handle complex dependencies, manage task failures gracefully, and scale execution capacity based on workload demands. Unlike simpler cron-based solutions, this Airflow deployment offers rich monitoring, alerting, and the ability to handle dynamic workflows that adapt based on data conditions or external triggers.
Key Features
- Python DAG (Directed Acyclic Graph) definition for workflow authoring with rich dependency management
- CeleryExecutor for distributed task execution across multiple worker containers
- PostgreSQL metadata database storing workflow definitions, task states, and execution history
- Redis message broker enabling reliable task queuing and worker communication
- Web UI at port 8080 for workflow monitoring, task debugging, and manual trigger management
- Automatic DAG discovery from mounted ./dags directory with hot-reloading capabilities
- Comprehensive logging system with persistent log storage across container restarts
- Built-in retry logic, SLA monitoring, and email alerting for production workflow management
Common Use Cases
- 1ETL pipeline orchestration for data warehouses with complex data source dependencies
- 2ML model training workflows with feature engineering, training, and deployment stages
- 3Daily batch processing of business reports with multiple data transformation steps
- 4Data lake ingestion pipelines handling various file formats and validation rules
- 5API data synchronization workflows pulling from multiple external services
- 6Database maintenance tasks including backups, cleanup, and data archival
- 7Multi-cloud data replication workflows with failure handling and monitoring
Prerequisites
- Minimum 4GB RAM (1GB for Airflow components, 1GB for PostgreSQL, 512MB for Redis)
- Port 8080 available for Airflow web interface access
- Docker Compose v3.8+ with volume and network support
- Python knowledge for DAG development and workflow definition
- Understanding of ETL concepts and data pipeline architecture
- PostgreSQL connection string format for database configuration
For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms
docker-compose.yml
docker-compose.yml
1services: 2 airflow-webserver: 3 image: apache/airflow:latest4 container_name: airflow-webserver5 command: webserver6 environment: 7 AIRFLOW__CORE__EXECUTOR: CeleryExecutor8 AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}9 AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/010 AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}11 volumes: 12 - ./dags:/opt/airflow/dags13 - airflow_logs:/opt/airflow/logs14 ports: 15 - "8080:8080"16 depends_on: 17 - postgres18 - redis19 networks: 20 - airflow2122 airflow-scheduler: 23 image: apache/airflow:latest24 container_name: airflow-scheduler25 command: scheduler26 environment: 27 AIRFLOW__CORE__EXECUTOR: CeleryExecutor28 AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}29 AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/030 volumes: 31 - ./dags:/opt/airflow/dags32 - airflow_logs:/opt/airflow/logs33 depends_on: 34 - postgres35 - redis36 networks: 37 - airflow3839 airflow-worker: 40 image: apache/airflow:latest41 container_name: airflow-worker42 command: celery worker43 environment: 44 AIRFLOW__CORE__EXECUTOR: CeleryExecutor45 AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}46 AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/047 volumes: 48 - ./dags:/opt/airflow/dags49 - airflow_logs:/opt/airflow/logs50 depends_on: 51 - postgres52 - redis53 networks: 54 - airflow5556 postgres: 57 image: postgres:16-alpine58 container_name: airflow-postgres59 environment: 60 POSTGRES_DB: ${DB_NAME}61 POSTGRES_USER: ${DB_USER}62 POSTGRES_PASSWORD: ${DB_PASSWORD}63 volumes: 64 - postgres_data:/var/lib/postgresql/data65 networks: 66 - airflow6768 redis: 69 image: redis:alpine70 container_name: airflow-redis71 networks: 72 - airflow7374volumes: 75 airflow_logs: 76 postgres_data: 7778networks: 79 airflow: 80 driver: bridge.env Template
.env
1DB_NAME=airflow2DB_USER=airflow3DB_PASSWORD=changemeUsage Notes
- 1Docs: https://airflow.apache.org/docs/
- 2Access at http://localhost:8080 - default login: airflow / airflow
- 3Initialize DB first: docker exec airflow-webserver airflow db init
- 4Place Python DAG files in ./dags folder - auto-detected
- 5Create admin: airflow users create --role Admin --username admin
- 6CeleryExecutor scales workers horizontally with replicas
Individual Services(5 services)
Copy individual services to mix and match with your existing compose files.
airflow-webserver
airflow-webserver:
image: apache/airflow:latest
container_name: airflow-webserver
command: webserver
environment:
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}
AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/0
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}
volumes:
- ./dags:/opt/airflow/dags
- airflow_logs:/opt/airflow/logs
ports:
- "8080:8080"
depends_on:
- postgres
- redis
networks:
- airflow
airflow-scheduler
airflow-scheduler:
image: apache/airflow:latest
container_name: airflow-scheduler
command: scheduler
environment:
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}
AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/0
volumes:
- ./dags:/opt/airflow/dags
- airflow_logs:/opt/airflow/logs
depends_on:
- postgres
- redis
networks:
- airflow
airflow-worker
airflow-worker:
image: apache/airflow:latest
container_name: airflow-worker
command: celery worker
environment:
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}
AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/0
volumes:
- ./dags:/opt/airflow/dags
- airflow_logs:/opt/airflow/logs
depends_on:
- postgres
- redis
networks:
- airflow
postgres
postgres:
image: postgres:16-alpine
container_name: airflow-postgres
environment:
POSTGRES_DB: ${DB_NAME}
POSTGRES_USER: ${DB_USER}
POSTGRES_PASSWORD: ${DB_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
networks:
- airflow
redis
redis:
image: redis:alpine
container_name: airflow-redis
networks:
- airflow
Quick Start
terminal
1# 1. Create the compose file2cat > docker-compose.yml << 'EOF'3services:4 airflow-webserver:5 image: apache/airflow:latest6 container_name: airflow-webserver7 command: webserver8 environment:9 AIRFLOW__CORE__EXECUTOR: CeleryExecutor10 AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}11 AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/012 AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}13 volumes:14 - ./dags:/opt/airflow/dags15 - airflow_logs:/opt/airflow/logs16 ports:17 - "8080:8080"18 depends_on:19 - postgres20 - redis21 networks:22 - airflow2324 airflow-scheduler:25 image: apache/airflow:latest26 container_name: airflow-scheduler27 command: scheduler28 environment:29 AIRFLOW__CORE__EXECUTOR: CeleryExecutor30 AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}31 AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/032 volumes:33 - ./dags:/opt/airflow/dags34 - airflow_logs:/opt/airflow/logs35 depends_on:36 - postgres37 - redis38 networks:39 - airflow4041 airflow-worker:42 image: apache/airflow:latest43 container_name: airflow-worker44 command: celery worker45 environment:46 AIRFLOW__CORE__EXECUTOR: CeleryExecutor47 AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://${DB_USER}:${DB_PASSWORD}@postgres/${DB_NAME}48 AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/049 volumes:50 - ./dags:/opt/airflow/dags51 - airflow_logs:/opt/airflow/logs52 depends_on:53 - postgres54 - redis55 networks:56 - airflow5758 postgres:59 image: postgres:16-alpine60 container_name: airflow-postgres61 environment:62 POSTGRES_DB: ${DB_NAME}63 POSTGRES_USER: ${DB_USER}64 POSTGRES_PASSWORD: ${DB_PASSWORD}65 volumes:66 - postgres_data:/var/lib/postgresql/data67 networks:68 - airflow6970 redis:71 image: redis:alpine72 container_name: airflow-redis73 networks:74 - airflow7576volumes:77 airflow_logs:78 postgres_data:7980networks:81 airflow:82 driver: bridge83EOF8485# 2. Create the .env file86cat > .env << 'EOF'87DB_NAME=airflow88DB_USER=airflow89DB_PASSWORD=changeme90EOF9192# 3. Start the services93docker compose up -d9495# 4. View logs96docker compose logs -fOne-Liner
Run this command to download and set up the recipe in one step:
terminal
1curl -fsSL https://docker.recipes/api/recipes/airflow/run | bashTroubleshooting
- Import errors in DAGs: Check Python dependencies in ./dags directory and ensure proper syntax
- Scheduler not picking up new DAGs: Verify ./dags volume mount and check DAG parsing errors in logs
- Tasks stuck in queued state: Check Redis connectivity and restart airflow-worker container
- Database connection failures: Verify PostgreSQL container health and environment variables match
- Web UI shows 'Airflow not ready': Run 'docker exec airflow-webserver airflow db init' to initialize database
- Worker tasks timing out: Increase Celery task timeout settings or scale worker replicas
Community Notes
Loading...
Loading notes...
Download Recipe Kit
Get all files in a ready-to-deploy package
Includes docker-compose.yml, .env template, README, and license
Ad Space
Shortcuts: C CopyF FavoriteD Download