docker.recipes

Dagster Data Platform

intermediate

Dagster data orchestration platform with PostgreSQL and web UI.

Overview

Dagster is a modern data orchestration platform developed by Elementl that transforms how organizations build, test, and monitor data pipelines. Unlike traditional workflow engines that focus solely on task execution, Dagster provides a comprehensive framework for data asset management with strong typing, automatic lineage tracking, and powerful observability features. It emerged in 2019 as a response to the limitations of existing tools like Airflow, offering a code-first approach that treats data pipelines as software engineering artifacts with proper testing, versioning, and development practices. This stack combines Dagster's multi-component architecture with PostgreSQL as the persistent storage backend. The dagster-webserver provides the intuitive Dagit web interface for pipeline visualization and monitoring, while dagster-daemon handles background processes like scheduling, sensor evaluation, and run queue management. PostgreSQL stores all metadata including asset definitions, run history, event logs, and schedule states, providing ACID compliance and robust querying capabilities that Dagster leverages for complex lineage analysis and performance optimization. Data engineers and analytics teams building modern data platforms will find this combination particularly valuable for replacing fragmented toolchains. Organizations migrating from legacy ETL tools, teams implementing DataOps practices, and companies requiring strong data governance and observability should consider this stack. The combination excels in environments where data quality, testing, and collaboration between data teams are critical, making it ideal for financial services, healthcare, and technology companies with complex data transformation requirements.

Key Features

  • Asset-based data modeling with automatic dependency resolution and lineage tracking across complex data transformations
  • Software-defined assets (SDAs) that treat data tables, ML models, and reports as first-class versioned objects
  • Built-in data quality testing framework with configurable asset checks and automated anomaly detection
  • Multi-tenancy support through workspaces allowing isolated development and production pipeline environments
  • Native integration with dbt, Spark, Pandas, and cloud data platforms through Dagster's extensive library ecosystem
  • Backfill capabilities for historical data processing with intelligent partitioning and incremental updates
  • Real-time monitoring with Slack, email, and webhook notifications for pipeline failures and SLA violations
  • PostgreSQL-backed metadata store providing fast queries for asset lineage, run history, and performance analytics

Common Use Cases

  • 1Modern data warehouse orchestration replacing legacy ETL tools like Informatica or DataStage with code-first pipelines
  • 2MLOps workflows for training, validating, and deploying machine learning models with feature store integration
  • 3Data lake processing for ingesting and transforming large volumes of raw data from multiple sources
  • 4Business intelligence pipeline automation connecting data sources to reporting tools like Tableau or Looker
  • 5Real-time data processing for streaming analytics and event-driven data transformations
  • 6Data migration projects requiring complex validation, transformation, and quality assurance processes
  • 7Multi-environment data platform management for development, staging, and production data pipeline deployment

Prerequisites

  • Minimum 4GB RAM recommended for running Dagster webserver, daemon, and PostgreSQL with moderate workloads
  • Docker Engine 20.10+ and Docker Compose V2 for proper container networking and volume management
  • Port 3000 available for Dagster web interface access and port 5432 free for PostgreSQL connections
  • Python development knowledge for writing Dagster pipelines, assets, and custom resource configurations
  • Understanding of data pipeline concepts including ETL/ELT patterns, data lineage, and workflow orchestration
  • Basic SQL knowledge for database troubleshooting and custom metadata queries against PostgreSQL backend

For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

docker-compose.yml

docker-compose.yml
1services:
2 postgres:
3 image: postgres:16-alpine
4 container_name: dagster-postgres
5 restart: unless-stopped
6 environment:
7 POSTGRES_USER: ${POSTGRES_USER:-dagster}
8 POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-dagster}
9 POSTGRES_DB: ${POSTGRES_DB:-dagster}
10 volumes:
11 - postgres_data:/var/lib/postgresql/data
12 networks:
13 - dagster-network
14
15 dagster-webserver:
16 image: dagster/dagster-k8s:latest
17 container_name: dagster-webserver
18 restart: unless-stopped
19 entrypoint:
20 - dagster-webserver
21 - -h
22 - "0.0.0.0"
23 - -p
24 - "3000"
25 ports:
26 - "${DAGSTER_PORT:-3000}:3000"
27 environment:
28 DAGSTER_POSTGRES_USER: ${POSTGRES_USER:-dagster}
29 DAGSTER_POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-dagster}
30 DAGSTER_POSTGRES_DB: ${POSTGRES_DB:-dagster}
31 DAGSTER_POSTGRES_HOST: postgres
32 depends_on:
33 - postgres
34 networks:
35 - dagster-network
36
37 dagster-daemon:
38 image: dagster/dagster-k8s:latest
39 container_name: dagster-daemon
40 restart: unless-stopped
41 entrypoint:
42 - dagster-daemon
43 - run
44 environment:
45 DAGSTER_POSTGRES_USER: ${POSTGRES_USER:-dagster}
46 DAGSTER_POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-dagster}
47 DAGSTER_POSTGRES_DB: ${POSTGRES_DB:-dagster}
48 DAGSTER_POSTGRES_HOST: postgres
49 depends_on:
50 - postgres
51 networks:
52 - dagster-network
53
54volumes:
55 postgres_data:
56
57networks:
58 dagster-network:
59 driver: bridge

.env Template

.env
1# Dagster
2DAGSTER_PORT=3000
3POSTGRES_USER=dagster
4POSTGRES_PASSWORD=dagster
5POSTGRES_DB=dagster

Usage Notes

  1. 1Dagster UI at http://localhost:3000
  2. 2Define pipelines in Python
  3. 3Daemon handles schedules and sensors
  4. 4Great for modern data pipelines

Individual Services(3 services)

Copy individual services to mix and match with your existing compose files.

postgres
postgres:
  image: postgres:16-alpine
  container_name: dagster-postgres
  restart: unless-stopped
  environment:
    POSTGRES_USER: ${POSTGRES_USER:-dagster}
    POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-dagster}
    POSTGRES_DB: ${POSTGRES_DB:-dagster}
  volumes:
    - postgres_data:/var/lib/postgresql/data
  networks:
    - dagster-network
dagster-webserver
dagster-webserver:
  image: dagster/dagster-k8s:latest
  container_name: dagster-webserver
  restart: unless-stopped
  entrypoint:
    - dagster-webserver
    - "-h"
    - 0.0.0.0
    - "-p"
    - "3000"
  ports:
    - ${DAGSTER_PORT:-3000}:3000
  environment:
    DAGSTER_POSTGRES_USER: ${POSTGRES_USER:-dagster}
    DAGSTER_POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-dagster}
    DAGSTER_POSTGRES_DB: ${POSTGRES_DB:-dagster}
    DAGSTER_POSTGRES_HOST: postgres
  depends_on:
    - postgres
  networks:
    - dagster-network
dagster-daemon
dagster-daemon:
  image: dagster/dagster-k8s:latest
  container_name: dagster-daemon
  restart: unless-stopped
  entrypoint:
    - dagster-daemon
    - run
  environment:
    DAGSTER_POSTGRES_USER: ${POSTGRES_USER:-dagster}
    DAGSTER_POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-dagster}
    DAGSTER_POSTGRES_DB: ${POSTGRES_DB:-dagster}
    DAGSTER_POSTGRES_HOST: postgres
  depends_on:
    - postgres
  networks:
    - dagster-network

Quick Start

terminal
1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4 postgres:
5 image: postgres:16-alpine
6 container_name: dagster-postgres
7 restart: unless-stopped
8 environment:
9 POSTGRES_USER: ${POSTGRES_USER:-dagster}
10 POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-dagster}
11 POSTGRES_DB: ${POSTGRES_DB:-dagster}
12 volumes:
13 - postgres_data:/var/lib/postgresql/data
14 networks:
15 - dagster-network
16
17 dagster-webserver:
18 image: dagster/dagster-k8s:latest
19 container_name: dagster-webserver
20 restart: unless-stopped
21 entrypoint:
22 - dagster-webserver
23 - -h
24 - "0.0.0.0"
25 - -p
26 - "3000"
27 ports:
28 - "${DAGSTER_PORT:-3000}:3000"
29 environment:
30 DAGSTER_POSTGRES_USER: ${POSTGRES_USER:-dagster}
31 DAGSTER_POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-dagster}
32 DAGSTER_POSTGRES_DB: ${POSTGRES_DB:-dagster}
33 DAGSTER_POSTGRES_HOST: postgres
34 depends_on:
35 - postgres
36 networks:
37 - dagster-network
38
39 dagster-daemon:
40 image: dagster/dagster-k8s:latest
41 container_name: dagster-daemon
42 restart: unless-stopped
43 entrypoint:
44 - dagster-daemon
45 - run
46 environment:
47 DAGSTER_POSTGRES_USER: ${POSTGRES_USER:-dagster}
48 DAGSTER_POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-dagster}
49 DAGSTER_POSTGRES_DB: ${POSTGRES_DB:-dagster}
50 DAGSTER_POSTGRES_HOST: postgres
51 depends_on:
52 - postgres
53 networks:
54 - dagster-network
55
56volumes:
57 postgres_data:
58
59networks:
60 dagster-network:
61 driver: bridge
62EOF
63
64# 2. Create the .env file
65cat > .env << 'EOF'
66# Dagster
67DAGSTER_PORT=3000
68POSTGRES_USER=dagster
69POSTGRES_PASSWORD=dagster
70POSTGRES_DB=dagster
71EOF
72
73# 3. Start the services
74docker compose up -d
75
76# 4. View logs
77docker compose logs -f

One-Liner

Run this command to download and set up the recipe in one step:

terminal
1curl -fsSL https://docker.recipes/api/recipes/dagster-data-platform/run | bash

Troubleshooting

  • Dagster webserver fails to start with database connection errors: Verify PostgreSQL container is fully initialized before webserver startup and check DAGSTER_POSTGRES_* environment variables match database credentials
  • Pipeline runs stuck in QUEUED status indefinitely: Restart dagster-daemon container as the daemon may have stopped processing the run queue due to resource constraints or configuration issues
  • Asset materialization failures with 'ImportError' messages: Mount your Dagster project code directory into both webserver and daemon containers using additional volume mounts
  • High memory usage causing container crashes: Increase Docker memory limits and consider adding PostgreSQL connection pooling configuration to prevent connection exhaustion
  • Dagit UI shows 'No repositories found' error: Ensure your dagster.yaml workspace configuration file is properly mounted and contains valid repository definitions
  • Slow query performance in PostgreSQL affecting pipeline execution: Add database indexes on frequently queried event log tables and consider PostgreSQL performance tuning parameters

Community Notes

Loading...
Loading notes...

Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

Ad Space