docker.recipes

ClearML

intermediate

ML experiment tracking and automation platform.

Overview

ClearML is an open-source MLOps platform that automates machine learning experiment tracking, model management, and workflow orchestration. Originally developed by Allegro AI, ClearML addresses the critical challenge of reproducibility in machine learning by automatically capturing experiment parameters, metrics, and artifacts without requiring code changes. It integrates with popular ML frameworks like PyTorch, TensorFlow, and scikit-learn to provide comprehensive experiment lineage and model versioning capabilities. This stack combines ClearML's three core services with a robust data infrastructure: MongoDB stores experiment metadata and project configurations, Elasticsearch indexes and searches through experiment logs and metrics for fast retrieval, and Redis provides high-speed caching for real-time experiment monitoring and task queuing. The architecture separates concerns with dedicated containers for the web interface, API server, and file storage, enabling scalable deployment and efficient resource utilization. Data scientists and ML engineers working on teams will find this configuration invaluable for establishing experiment reproducibility, comparing model performance across iterations, and automating ML pipelines. Organizations transitioning from ad-hoc ML development to structured MLOps practices benefit from ClearML's ability to retrofit existing codebases with tracking capabilities, while the included data stack ensures reliable storage and fast querying of experiment history even as projects scale to thousands of experiments.

Key Features

  • Automatic experiment tracking for PyTorch, TensorFlow, Keras, and scikit-learn without code modifications
  • Web-based experiment comparison with interactive plots and hyperparameter visualization
  • MongoDB-backed experiment metadata storage with flexible schema for diverse ML frameworks
  • Elasticsearch-powered full-text search across experiment logs, parameters, and model descriptions
  • Redis-accelerated real-time experiment monitoring and distributed task queue management
  • File server for artifact storage including model checkpoints, datasets, and visualization assets
  • REST API server enabling programmatic access to experiment data and remote task execution
  • Agent-based remote execution system for distributed training and hyperparameter optimization

Common Use Cases

  • 1ML research teams tracking hundreds of hyperparameter tuning experiments across multiple models
  • 2Data science organizations implementing experiment reproducibility standards and audit trails
  • 3Distributed machine learning workflows requiring remote agent execution on GPU clusters
  • 4Model comparison and selection processes with automated metric tracking and visualization
  • 5MLOps pipeline automation with programmatic experiment triggering and result analysis
  • 6Academic research groups needing collaborative experiment sharing and progress tracking
  • 7Production ML teams managing model versioning and deployment artifact lineage

Prerequisites

  • Minimum 6GB RAM (2GB for Elasticsearch, 2GB for MongoDB, 1GB for ClearML services, 1GB system overhead)
  • Docker and Docker Compose with support for multi-container networking and volume persistence
  • Ports 8080, 8008, and 8081 available for ClearML web interface, API server, and file server
  • Understanding of machine learning workflows and experiment tracking concepts
  • Python development environment for ClearML SDK installation and configuration
  • Sufficient disk space for experiment artifacts, model checkpoints, and database storage (recommend 50GB+)

For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

docker-compose.yml

docker-compose.yml
1services:
2 clearml-webserver:
3 image: allegroai/clearml:latest
4 container_name: clearml-webserver
5 restart: unless-stopped
6 environment:
7 CLEARML_HOST_IP: localhost
8 CLEARML__apiserver__mongo__host: mongo
9 CLEARML__apiserver__es__host: elasticsearch
10 ports:
11 - "8080:8080"
12 depends_on:
13 - mongo
14 - elasticsearch
15 - redis
16 networks:
17 - clearml
18
19 clearml-apiserver:
20 image: allegroai/clearml:latest
21 container_name: clearml-apiserver
22 restart: unless-stopped
23 command: apiserver
24 environment:
25 CLEARML__apiserver__mongo__host: mongo
26 CLEARML__apiserver__es__host: elasticsearch
27 CLEARML__redis__host: redis
28 ports:
29 - "8008:8008"
30 depends_on:
31 - mongo
32 - elasticsearch
33 - redis
34 networks:
35 - clearml
36
37 clearml-fileserver:
38 image: allegroai/clearml:latest
39 container_name: clearml-fileserver
40 command: fileserver
41 ports:
42 - "8081:8081"
43 volumes:
44 - clearml_files:/mnt/fileserver
45 networks:
46 - clearml
47
48 mongo:
49 image: mongo:4.4
50 container_name: clearml-mongo
51 volumes:
52 - mongo_data:/data/db
53 networks:
54 - clearml
55
56 elasticsearch:
57 image: elasticsearch:7.17.0
58 container_name: clearml-es
59 environment:
60 - discovery.type=single-node
61 - ES_JAVA_OPTS=-Xms512m -Xmx512m
62 volumes:
63 - es_data:/usr/share/elasticsearch/data
64 networks:
65 - clearml
66
67 redis:
68 image: redis:alpine
69 container_name: clearml-redis
70 networks:
71 - clearml
72
73volumes:
74 clearml_files:
75 mongo_data:
76 es_data:
77
78networks:
79 clearml:
80 driver: bridge

.env Template

.env
1# Configure credentials via web UI

Usage Notes

  1. 1Docs: https://clear.ml/docs/latest/
  2. 2Web UI at http://localhost:8080 - configure credentials on first visit
  3. 3API server at http://localhost:8008, file server at :8081
  4. 4Python SDK: pip install clearml, then clearml-init to configure
  5. 5Auto-logs PyTorch, TensorFlow, scikit-learn experiments
  6. 6Agent for remote execution: clearml-agent daemon --queue default

Individual Services(6 services)

Copy individual services to mix and match with your existing compose files.

clearml-webserver
clearml-webserver:
  image: allegroai/clearml:latest
  container_name: clearml-webserver
  restart: unless-stopped
  environment:
    CLEARML_HOST_IP: localhost
    CLEARML__apiserver__mongo__host: mongo
    CLEARML__apiserver__es__host: elasticsearch
  ports:
    - "8080:8080"
  depends_on:
    - mongo
    - elasticsearch
    - redis
  networks:
    - clearml
clearml-apiserver
clearml-apiserver:
  image: allegroai/clearml:latest
  container_name: clearml-apiserver
  restart: unless-stopped
  command: apiserver
  environment:
    CLEARML__apiserver__mongo__host: mongo
    CLEARML__apiserver__es__host: elasticsearch
    CLEARML__redis__host: redis
  ports:
    - "8008:8008"
  depends_on:
    - mongo
    - elasticsearch
    - redis
  networks:
    - clearml
clearml-fileserver
clearml-fileserver:
  image: allegroai/clearml:latest
  container_name: clearml-fileserver
  command: fileserver
  ports:
    - "8081:8081"
  volumes:
    - clearml_files:/mnt/fileserver
  networks:
    - clearml
mongo
mongo:
  image: mongo:4.4
  container_name: clearml-mongo
  volumes:
    - mongo_data:/data/db
  networks:
    - clearml
elasticsearch
elasticsearch:
  image: elasticsearch:7.17.0
  container_name: clearml-es
  environment:
    - discovery.type=single-node
    - ES_JAVA_OPTS=-Xms512m -Xmx512m
  volumes:
    - es_data:/usr/share/elasticsearch/data
  networks:
    - clearml
redis
redis:
  image: redis:alpine
  container_name: clearml-redis
  networks:
    - clearml

Quick Start

terminal
1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4 clearml-webserver:
5 image: allegroai/clearml:latest
6 container_name: clearml-webserver
7 restart: unless-stopped
8 environment:
9 CLEARML_HOST_IP: localhost
10 CLEARML__apiserver__mongo__host: mongo
11 CLEARML__apiserver__es__host: elasticsearch
12 ports:
13 - "8080:8080"
14 depends_on:
15 - mongo
16 - elasticsearch
17 - redis
18 networks:
19 - clearml
20
21 clearml-apiserver:
22 image: allegroai/clearml:latest
23 container_name: clearml-apiserver
24 restart: unless-stopped
25 command: apiserver
26 environment:
27 CLEARML__apiserver__mongo__host: mongo
28 CLEARML__apiserver__es__host: elasticsearch
29 CLEARML__redis__host: redis
30 ports:
31 - "8008:8008"
32 depends_on:
33 - mongo
34 - elasticsearch
35 - redis
36 networks:
37 - clearml
38
39 clearml-fileserver:
40 image: allegroai/clearml:latest
41 container_name: clearml-fileserver
42 command: fileserver
43 ports:
44 - "8081:8081"
45 volumes:
46 - clearml_files:/mnt/fileserver
47 networks:
48 - clearml
49
50 mongo:
51 image: mongo:4.4
52 container_name: clearml-mongo
53 volumes:
54 - mongo_data:/data/db
55 networks:
56 - clearml
57
58 elasticsearch:
59 image: elasticsearch:7.17.0
60 container_name: clearml-es
61 environment:
62 - discovery.type=single-node
63 - ES_JAVA_OPTS=-Xms512m -Xmx512m
64 volumes:
65 - es_data:/usr/share/elasticsearch/data
66 networks:
67 - clearml
68
69 redis:
70 image: redis:alpine
71 container_name: clearml-redis
72 networks:
73 - clearml
74
75volumes:
76 clearml_files:
77 mongo_data:
78 es_data:
79
80networks:
81 clearml:
82 driver: bridge
83EOF
84
85# 2. Create the .env file
86cat > .env << 'EOF'
87# Configure credentials via web UI
88EOF
89
90# 3. Start the services
91docker compose up -d
92
93# 4. View logs
94docker compose logs -f

One-Liner

Run this command to download and set up the recipe in one step:

terminal
1curl -fsSL https://docker.recipes/api/recipes/clearml/run | bash

Troubleshooting

  • ClearML web interface shows 'Server not responding': Verify clearml-apiserver container is running and MongoDB connection is established
  • Elasticsearch container exits with 'max virtual memory areas vm.max_map_count too low': Run 'sysctl -w vm.max_map_count=262144' on Docker host
  • Experiment tracking fails with 'Could not connect to API server': Ensure port 8008 is accessible and run 'clearml-init' to configure SDK credentials
  • MongoDB connection errors in API server logs: Check that mongo container is fully started before clearml-apiserver attempts connection
  • File upload failures to clearml-fileserver: Verify clearml_files volume has proper write permissions and sufficient disk space
  • Redis connection timeouts during high experiment load: Increase Redis memory limit or enable Redis persistence for data durability

Community Notes

Loading...
Loading notes...

Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

Ad Space