DVC + MLflow Data Science Stack
Data Version Control with MLflow experiment tracking for reproducible ML pipelines.
Overview
MLflow is an open-source platform designed to manage the complete machine learning lifecycle, from experimentation to deployment. Developed by Databricks, MLflow addresses the complexity of ML workflows by providing experiment tracking, model versioning, and artifact management capabilities that help data scientists and ML engineers maintain reproducibility across their projects. The platform has become essential for teams needing to scale ML operations beyond individual notebooks and ad-hoc experiments.
This stack combines MLflow with Data Version Control (DVC) principles using MinIO for S3-compatible object storage and PostgreSQL for metadata persistence. MLflow handles experiment tracking and model registry functions, while MinIO stores large datasets, model artifacts, and training outputs. PostgreSQL maintains experiment metadata, parameters, and metrics in a structured format. Jupyter provides the interactive development environment where data scientists can run experiments that automatically log to MLflow while accessing versioned data from MinIO.
Data science teams working on collaborative ML projects will find this stack particularly valuable for establishing reproducible workflows. The combination addresses common challenges like data drift tracking, experiment comparison, model lineage, and artifact storage. Research teams, ML startups, and enterprise data science groups can use this setup to implement MLOps practices without vendor lock-in, maintaining full control over their ML infrastructure while benefiting from industry-standard tools for version control and experiment management.
Key Features
- Experiment tracking with automatic parameter, metric, and artifact logging
- S3-compatible object storage for datasets, models, and large ML artifacts
- Model registry with versioning and stage transitions for production deployment
- PostgreSQL backend for scalable experiment metadata and query performance
- DVC integration for data versioning and pipeline reproducibility
- JupyterLab interface with pre-configured MLflow and MinIO connectivity
- Bucket lifecycle management for automated artifact retention policies
- Multi-framework ML support including scikit-learn, TensorFlow, and PyTorch
Common Use Cases
- 1ML research teams tracking multiple experiment runs with hyperparameter tuning
- 2Data science organizations implementing reproducible ML pipelines with version control
- 3Startups building ML products requiring model versioning and deployment tracking
- 4Enterprise teams collaborating on ML projects with shared experiment history
- 5Academic institutions teaching MLOps practices with self-hosted infrastructure
- 6Companies migrating from cloud ML platforms to on-premises solutions
- 7Teams requiring audit trails and compliance tracking for ML model development
Prerequisites
- Docker Engine 20.10+ and Docker Compose V2 for container orchestration
- Minimum 4GB RAM (8GB+ recommended for concurrent ML training workloads)
- Available ports 5000, 8888, 9000, and 9001 for service access
- Basic familiarity with Python ML libraries and Jupyter notebook environments
- Understanding of S3 API concepts for MinIO bucket and object management
- Knowledge of MLflow tracking concepts including runs, experiments, and artifacts
For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms
docker-compose.yml
docker-compose.yml
1services: 2 jupyter: 3 image: jupyter/scipy-notebook:latest4 ports: 5 - "8888:8888"6 environment: 7 - MLFLOW_TRACKING_URI=http://mlflow:50008 - AWS_ACCESS_KEY_ID=${MINIO_ACCESS_KEY}9 - AWS_SECRET_ACCESS_KEY=${MINIO_SECRET_KEY}10 - AWS_DEFAULT_REGION=us-east-111 volumes: 12 - notebooks:/home/jovyan/work13 - ./dvc-config:/home/jovyan/.dvc14 command: start-notebook.sh --NotebookApp.token=${JUPYTER_TOKEN}15 networks: 16 - mlstack-net17 restart: unless-stopped1819 mlflow: 20 image: ghcr.io/mlflow/mlflow:latest21 ports: 22 - "5000:5000"23 environment: 24 - MLFLOW_BACKEND_STORE_URI=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/mlflow25 - MLFLOW_DEFAULT_ARTIFACT_ROOT=s3://mlflow-artifacts26 - AWS_ACCESS_KEY_ID=${MINIO_ACCESS_KEY}27 - AWS_SECRET_ACCESS_KEY=${MINIO_SECRET_KEY}28 - MLFLOW_S3_ENDPOINT_URL=http://minio:900029 command: mlflow server --host 0.0.0.0 --port 500030 depends_on: 31 postgres: 32 condition: service_healthy33 minio: 34 condition: service_started35 networks: 36 - mlstack-net37 restart: unless-stopped3839 postgres: 40 image: postgres:16-alpine41 environment: 42 POSTGRES_USER: ${POSTGRES_USER}43 POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}44 POSTGRES_DB: mlflow45 volumes: 46 - postgres_data:/var/lib/postgresql/data47 healthcheck: 48 test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER}"]49 interval: 10s50 timeout: 5s51 retries: 552 networks: 53 - mlstack-net54 restart: unless-stopped5556 minio: 57 image: minio/minio:latest58 ports: 59 - "9000:9000"60 - "9001:9001"61 environment: 62 MINIO_ROOT_USER: ${MINIO_ACCESS_KEY}63 MINIO_ROOT_PASSWORD: ${MINIO_SECRET_KEY}64 volumes: 65 - minio_data:/data66 command: server /data --console-address ":9001"67 networks: 68 - mlstack-net69 restart: unless-stopped7071 minio-init: 72 image: minio/mc:latest73 depends_on: 74 - minio75 entrypoint: >76 /bin/sh -c "77 sleep 5;78 mc alias set myminio http: //minio:9000 ${MINIO_ACCESS_KEY} ${MINIO_SECRET_KEY};79 mc mb myminio/mlflow-artifacts --ignore-existing;80 mc mb myminio/dvc-storage --ignore-existing;81 exit 0;82 "83 networks: 84 - mlstack-net8586volumes: 87 notebooks: 88 postgres_data: 89 minio_data: 9091networks: 92 mlstack-net: 93 driver: bridge.env Template
.env
1# Jupyter Configuration2JUPYTER_TOKEN=secure_jupyter_token34# PostgreSQL Configuration5POSTGRES_USER=mlflow6POSTGRES_PASSWORD=secure_postgres_password78# MinIO Configuration9MINIO_ACCESS_KEY=minioadmin10MINIO_SECRET_KEY=secure_minio_passwordUsage Notes
- 1Jupyter Lab at http://localhost:8888
- 2MLflow UI at http://localhost:5000
- 3MinIO Console at http://localhost:9001
- 4Configure DVC remote: dvc remote add -d minio s3://dvc-storage
Individual Services(5 services)
Copy individual services to mix and match with your existing compose files.
jupyter
jupyter:
image: jupyter/scipy-notebook:latest
ports:
- "8888:8888"
environment:
- MLFLOW_TRACKING_URI=http://mlflow:5000
- AWS_ACCESS_KEY_ID=${MINIO_ACCESS_KEY}
- AWS_SECRET_ACCESS_KEY=${MINIO_SECRET_KEY}
- AWS_DEFAULT_REGION=us-east-1
volumes:
- notebooks:/home/jovyan/work
- ./dvc-config:/home/jovyan/.dvc
command: start-notebook.sh --NotebookApp.token=${JUPYTER_TOKEN}
networks:
- mlstack-net
restart: unless-stopped
mlflow
mlflow:
image: ghcr.io/mlflow/mlflow:latest
ports:
- "5000:5000"
environment:
- MLFLOW_BACKEND_STORE_URI=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/mlflow
- MLFLOW_DEFAULT_ARTIFACT_ROOT=s3://mlflow-artifacts
- AWS_ACCESS_KEY_ID=${MINIO_ACCESS_KEY}
- AWS_SECRET_ACCESS_KEY=${MINIO_SECRET_KEY}
- MLFLOW_S3_ENDPOINT_URL=http://minio:9000
command: mlflow server --host 0.0.0.0 --port 5000
depends_on:
postgres:
condition: service_healthy
minio:
condition: service_started
networks:
- mlstack-net
restart: unless-stopped
postgres
postgres:
image: postgres:16-alpine
environment:
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_DB: mlflow
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test:
- CMD-SHELL
- pg_isready -U ${POSTGRES_USER}
interval: 10s
timeout: 5s
retries: 5
networks:
- mlstack-net
restart: unless-stopped
minio
minio:
image: minio/minio:latest
ports:
- "9000:9000"
- "9001:9001"
environment:
MINIO_ROOT_USER: ${MINIO_ACCESS_KEY}
MINIO_ROOT_PASSWORD: ${MINIO_SECRET_KEY}
volumes:
- minio_data:/data
command: server /data --console-address ":9001"
networks:
- mlstack-net
restart: unless-stopped
minio-init
minio-init:
image: minio/mc:latest
depends_on:
- minio
entrypoint: |
/bin/sh -c " sleep 5; mc alias set myminio http://minio:9000 ${MINIO_ACCESS_KEY} ${MINIO_SECRET_KEY}; mc mb myminio/mlflow-artifacts --ignore-existing; mc mb myminio/dvc-storage --ignore-existing; exit 0; "
networks:
- mlstack-net
Quick Start
terminal
1# 1. Create the compose file2cat > docker-compose.yml << 'EOF'3services:4 jupyter:5 image: jupyter/scipy-notebook:latest6 ports:7 - "8888:8888"8 environment:9 - MLFLOW_TRACKING_URI=http://mlflow:500010 - AWS_ACCESS_KEY_ID=${MINIO_ACCESS_KEY}11 - AWS_SECRET_ACCESS_KEY=${MINIO_SECRET_KEY}12 - AWS_DEFAULT_REGION=us-east-113 volumes:14 - notebooks:/home/jovyan/work15 - ./dvc-config:/home/jovyan/.dvc16 command: start-notebook.sh --NotebookApp.token=${JUPYTER_TOKEN}17 networks:18 - mlstack-net19 restart: unless-stopped2021 mlflow:22 image: ghcr.io/mlflow/mlflow:latest23 ports:24 - "5000:5000"25 environment:26 - MLFLOW_BACKEND_STORE_URI=postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/mlflow27 - MLFLOW_DEFAULT_ARTIFACT_ROOT=s3://mlflow-artifacts28 - AWS_ACCESS_KEY_ID=${MINIO_ACCESS_KEY}29 - AWS_SECRET_ACCESS_KEY=${MINIO_SECRET_KEY}30 - MLFLOW_S3_ENDPOINT_URL=http://minio:900031 command: mlflow server --host 0.0.0.0 --port 500032 depends_on:33 postgres:34 condition: service_healthy35 minio:36 condition: service_started37 networks:38 - mlstack-net39 restart: unless-stopped4041 postgres:42 image: postgres:16-alpine43 environment:44 POSTGRES_USER: ${POSTGRES_USER}45 POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}46 POSTGRES_DB: mlflow47 volumes:48 - postgres_data:/var/lib/postgresql/data49 healthcheck:50 test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER}"]51 interval: 10s52 timeout: 5s53 retries: 554 networks:55 - mlstack-net56 restart: unless-stopped5758 minio:59 image: minio/minio:latest60 ports:61 - "9000:9000"62 - "9001:9001"63 environment:64 MINIO_ROOT_USER: ${MINIO_ACCESS_KEY}65 MINIO_ROOT_PASSWORD: ${MINIO_SECRET_KEY}66 volumes:67 - minio_data:/data68 command: server /data --console-address ":9001"69 networks:70 - mlstack-net71 restart: unless-stopped7273 minio-init:74 image: minio/mc:latest75 depends_on:76 - minio77 entrypoint: >78 /bin/sh -c "79 sleep 5;80 mc alias set myminio http://minio:9000 ${MINIO_ACCESS_KEY} ${MINIO_SECRET_KEY};81 mc mb myminio/mlflow-artifacts --ignore-existing;82 mc mb myminio/dvc-storage --ignore-existing;83 exit 0;84 "85 networks:86 - mlstack-net8788volumes:89 notebooks:90 postgres_data:91 minio_data:9293networks:94 mlstack-net:95 driver: bridge96EOF9798# 2. Create the .env file99cat > .env << 'EOF'100# Jupyter Configuration101JUPYTER_TOKEN=secure_jupyter_token102103# PostgreSQL Configuration104POSTGRES_USER=mlflow105POSTGRES_PASSWORD=secure_postgres_password106107# MinIO Configuration108MINIO_ACCESS_KEY=minioadmin109MINIO_SECRET_KEY=secure_minio_password110EOF111112# 3. Start the services113docker compose up -d114115# 4. View logs116docker compose logs -fOne-Liner
Run this command to download and set up the recipe in one step:
terminal
1curl -fsSL https://docker.recipes/api/recipes/dvc-mlflow-stack/run | bashTroubleshooting
- MLflow UI shows 'Connection refused' error: Ensure PostgreSQL container is healthy before MLflow starts, check database connection string format
- Jupyter cannot connect to MLflow tracking server: Verify MLFLOW_TRACKING_URI environment variable points to http://mlflow:5000 within Docker network
- MinIO bucket creation fails during initialization: Check MINIO_ACCESS_KEY and MINIO_SECRET_KEY are properly set in environment file, restart minio-init container
- Large model artifacts fail to upload to MLflow: Increase MinIO upload limits and verify AWS credentials are configured correctly in Jupyter environment
- DVC remote configuration errors: Ensure MinIO buckets exist and DVC is configured with s3://dvc-storage endpoint pointing to MinIO service
- PostgreSQL connection pool exhaustion: Increase max_connections in PostgreSQL configuration or reduce concurrent MLflow experiment runs
Community Notes
Loading...
Loading notes...
Download Recipe Kit
Get all files in a ready-to-deploy package
Includes docker-compose.yml, .env template, README, and license
Components
mlflowminiopostgresqljupyter
Tags
#dvc#mlflow#data-versioning#experiment-tracking#reproducibility
Category
AI & Machine LearningAd Space
Shortcuts: C CopyF FavoriteD Download