docker.recipes

Prometheus HA Stack

advanced

High-availability Prometheus setup with Thanos for long-term storage.

Overview

Prometheus is an open-source monitoring and alerting toolkit that revolutionized the way organizations monitor their infrastructure and applications. Originally developed at SoundCloud in 2012, Prometheus pioneered the pull-based metrics collection model and introduced PromQL, a powerful query language for time-series data. Its dimensional data model using labels allows for flexible aggregation and filtering of metrics across complex, dynamic environments like Kubernetes clusters and microservices architectures. This high-availability stack combines Prometheus with Thanos to solve the fundamental limitations of standalone Prometheus deployments: limited storage capacity, lack of global querying across instances, and single points of failure. Thanos sidecars run alongside each Prometheus instance, uploading data blocks to MinIO object storage while providing a unified query interface through Thanos Query. The store gateway enables querying of historical data directly from object storage, while Alertmanager handles notification routing and Grafana provides rich visualization capabilities across the entire monitoring ecosystem. This configuration is essential for organizations running multiple Prometheus instances in production environments who need unlimited retention, global queries across data centers, and true high availability. Platform engineering teams, SRE organizations, and enterprises with compliance requirements for long-term metrics retention will find this stack indispensable for maintaining observability at scale while avoiding the operational complexity and cost of vendor-managed solutions.

Key Features

  • Dual Prometheus instances with Thanos sidecars for true high availability and zero data loss
  • Unlimited metrics retention through Thanos store gateway and MinIO object storage
  • Global querying across all Prometheus instances via Thanos Query with deduplication
  • S3-compatible object storage with MinIO for cost-effective long-term data persistence
  • Automatic block compaction and downsampling for efficient storage utilization
  • Replica label deduplication to handle identical metrics from multiple Prometheus instances
  • Integrated Alertmanager for centralized alert routing and notification management
  • Grafana integration with Thanos Query for unified dashboards across all data sources

Common Use Cases

  • 1Multi-region Kubernetes monitoring with global metrics visibility across clusters
  • 2Enterprise infrastructure monitoring requiring years of metrics retention for compliance
  • 3Platform engineering teams managing dozens of microservices across multiple environments
  • 4Organizations migrating from vendor solutions like DataDog to reduce monitoring costs
  • 5High-traffic SaaS applications needing reliable monitoring with zero data loss tolerance
  • 6Financial services requiring long-term metrics storage for regulatory compliance audits
  • 7Research institutions analyzing historical performance trends over extended periods

Prerequisites

  • Docker host with minimum 6GB RAM to support all components (Prometheus instances need 1GB+ each)
  • At least 20GB available storage for MinIO object storage and local Prometheus data
  • Network access to ports 3000, 9000-9001, 9090-9093, and 10902 for service interfaces
  • Understanding of PromQL query language and Prometheus configuration syntax
  • Familiarity with time-series data concepts and metrics cardinality considerations
  • Basic knowledge of object storage concepts and S3-compatible APIs for troubleshooting

For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

docker-compose.yml

docker-compose.yml
1services:
2 prometheus1:
3 image: prom/prometheus:latest
4 container_name: prometheus1
5 command:
6 - --config.file=/etc/prometheus/prometheus.yml
7 - --storage.tsdb.path=/prometheus
8 - --storage.tsdb.retention.time=2h
9 - --storage.tsdb.min-block-duration=2h
10 - --storage.tsdb.max-block-duration=2h
11 - --web.enable-lifecycle
12 - --web.enable-admin-api
13 volumes:
14 - ./prometheus1.yml:/etc/prometheus/prometheus.yml:ro
15 - prometheus1-data:/prometheus
16 ports:
17 - "9090:9090"
18 networks:
19 - prometheus-network
20 restart: unless-stopped
21
22 thanos-sidecar1:
23 image: quay.io/thanos/thanos:latest
24 container_name: thanos-sidecar1
25 command:
26 - sidecar
27 - --tsdb.path=/prometheus
28 - --prometheus.url=http://prometheus1:9090
29 - --grpc-address=0.0.0.0:10901
30 - --http-address=0.0.0.0:10902
31 - --objstore.config-file=/etc/thanos/bucket.yml
32 volumes:
33 - prometheus1-data:/prometheus:ro
34 - ./bucket.yml:/etc/thanos/bucket.yml:ro
35 depends_on:
36 - prometheus1
37 - minio
38 networks:
39 - prometheus-network
40 restart: unless-stopped
41
42 prometheus2:
43 image: prom/prometheus:latest
44 container_name: prometheus2
45 command:
46 - --config.file=/etc/prometheus/prometheus.yml
47 - --storage.tsdb.path=/prometheus
48 - --storage.tsdb.retention.time=2h
49 - --storage.tsdb.min-block-duration=2h
50 - --storage.tsdb.max-block-duration=2h
51 - --web.enable-lifecycle
52 - --web.enable-admin-api
53 volumes:
54 - ./prometheus2.yml:/etc/prometheus/prometheus.yml:ro
55 - prometheus2-data:/prometheus
56 ports:
57 - "9091:9090"
58 networks:
59 - prometheus-network
60 restart: unless-stopped
61
62 thanos-sidecar2:
63 image: quay.io/thanos/thanos:latest
64 container_name: thanos-sidecar2
65 command:
66 - sidecar
67 - --tsdb.path=/prometheus
68 - --prometheus.url=http://prometheus2:9090
69 - --grpc-address=0.0.0.0:10901
70 - --http-address=0.0.0.0:10902
71 - --objstore.config-file=/etc/thanos/bucket.yml
72 volumes:
73 - prometheus2-data:/prometheus:ro
74 - ./bucket.yml:/etc/thanos/bucket.yml:ro
75 depends_on:
76 - prometheus2
77 - minio
78 networks:
79 - prometheus-network
80 restart: unless-stopped
81
82 thanos-query:
83 image: quay.io/thanos/thanos:latest
84 container_name: thanos-query
85 command:
86 - query
87 - --http-address=0.0.0.0:9090
88 - --grpc-address=0.0.0.0:10901
89 - --store=thanos-sidecar1:10901
90 - --store=thanos-sidecar2:10901
91 - --store=thanos-store:10901
92 - --query.replica-label=replica
93 ports:
94 - "10902:9090"
95 networks:
96 - prometheus-network
97 restart: unless-stopped
98
99 thanos-store:
100 image: quay.io/thanos/thanos:latest
101 container_name: thanos-store
102 command:
103 - store
104 - --grpc-address=0.0.0.0:10901
105 - --http-address=0.0.0.0:10902
106 - --data-dir=/data
107 - --objstore.config-file=/etc/thanos/bucket.yml
108 volumes:
109 - thanos-store-data:/data
110 - ./bucket.yml:/etc/thanos/bucket.yml:ro
111 depends_on:
112 - minio
113 networks:
114 - prometheus-network
115 restart: unless-stopped
116
117 minio:
118 image: minio/minio:latest
119 container_name: thanos-minio
120 command: server /data --console-address ":9001"
121 environment:
122 - MINIO_ROOT_USER=${MINIO_ACCESS_KEY}
123 - MINIO_ROOT_PASSWORD=${MINIO_SECRET_KEY}
124 volumes:
125 - minio-data:/data
126 ports:
127 - "9000:9000"
128 - "9001:9001"
129 networks:
130 - prometheus-network
131 restart: unless-stopped
132
133 alertmanager:
134 image: prom/alertmanager:latest
135 container_name: alertmanager
136 volumes:
137 - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
138 - alertmanager-data:/alertmanager
139 ports:
140 - "9093:9093"
141 networks:
142 - prometheus-network
143 restart: unless-stopped
144
145 grafana:
146 image: grafana/grafana:latest
147 container_name: grafana
148 environment:
149 - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
150 volumes:
151 - grafana-data:/var/lib/grafana
152 ports:
153 - "3000:3000"
154 networks:
155 - prometheus-network
156 restart: unless-stopped
157
158volumes:
159 prometheus1-data:
160 prometheus2-data:
161 thanos-store-data:
162 minio-data:
163 alertmanager-data:
164 grafana-data:
165
166networks:
167 prometheus-network:
168 driver: bridge

.env Template

.env
1# Prometheus HA with Thanos
2GRAFANA_PASSWORD=secure_grafana_password
3MINIO_ACCESS_KEY=thanos
4MINIO_SECRET_KEY=thanos_secret
5
6# Create bucket.yml:
7# type: S3
8# config:
9# bucket: thanos
10# endpoint: minio:9000
11# access_key: thanos
12# secret_key: thanos_secret
13# insecure: true

Usage Notes

  1. 1Thanos Query at http://localhost:10902
  2. 2Grafana at http://localhost:3000
  3. 3Alertmanager at http://localhost:9093
  4. 4Long-term storage in MinIO
  5. 5Global view across Prometheus instances

Individual Services(9 services)

Copy individual services to mix and match with your existing compose files.

prometheus1
prometheus1:
  image: prom/prometheus:latest
  container_name: prometheus1
  command:
    - "--config.file=/etc/prometheus/prometheus.yml"
    - "--storage.tsdb.path=/prometheus"
    - "--storage.tsdb.retention.time=2h"
    - "--storage.tsdb.min-block-duration=2h"
    - "--storage.tsdb.max-block-duration=2h"
    - "--web.enable-lifecycle"
    - "--web.enable-admin-api"
  volumes:
    - ./prometheus1.yml:/etc/prometheus/prometheus.yml:ro
    - prometheus1-data:/prometheus
  ports:
    - "9090:9090"
  networks:
    - prometheus-network
  restart: unless-stopped
thanos-sidecar1
thanos-sidecar1:
  image: quay.io/thanos/thanos:latest
  container_name: thanos-sidecar1
  command:
    - sidecar
    - "--tsdb.path=/prometheus"
    - "--prometheus.url=http://prometheus1:9090"
    - "--grpc-address=0.0.0.0:10901"
    - "--http-address=0.0.0.0:10902"
    - "--objstore.config-file=/etc/thanos/bucket.yml"
  volumes:
    - prometheus1-data:/prometheus:ro
    - ./bucket.yml:/etc/thanos/bucket.yml:ro
  depends_on:
    - prometheus1
    - minio
  networks:
    - prometheus-network
  restart: unless-stopped
prometheus2
prometheus2:
  image: prom/prometheus:latest
  container_name: prometheus2
  command:
    - "--config.file=/etc/prometheus/prometheus.yml"
    - "--storage.tsdb.path=/prometheus"
    - "--storage.tsdb.retention.time=2h"
    - "--storage.tsdb.min-block-duration=2h"
    - "--storage.tsdb.max-block-duration=2h"
    - "--web.enable-lifecycle"
    - "--web.enable-admin-api"
  volumes:
    - ./prometheus2.yml:/etc/prometheus/prometheus.yml:ro
    - prometheus2-data:/prometheus
  ports:
    - "9091:9090"
  networks:
    - prometheus-network
  restart: unless-stopped
thanos-sidecar2
thanos-sidecar2:
  image: quay.io/thanos/thanos:latest
  container_name: thanos-sidecar2
  command:
    - sidecar
    - "--tsdb.path=/prometheus"
    - "--prometheus.url=http://prometheus2:9090"
    - "--grpc-address=0.0.0.0:10901"
    - "--http-address=0.0.0.0:10902"
    - "--objstore.config-file=/etc/thanos/bucket.yml"
  volumes:
    - prometheus2-data:/prometheus:ro
    - ./bucket.yml:/etc/thanos/bucket.yml:ro
  depends_on:
    - prometheus2
    - minio
  networks:
    - prometheus-network
  restart: unless-stopped
thanos-query
thanos-query:
  image: quay.io/thanos/thanos:latest
  container_name: thanos-query
  command:
    - query
    - "--http-address=0.0.0.0:9090"
    - "--grpc-address=0.0.0.0:10901"
    - "--store=thanos-sidecar1:10901"
    - "--store=thanos-sidecar2:10901"
    - "--store=thanos-store:10901"
    - "--query.replica-label=replica"
  ports:
    - "10902:9090"
  networks:
    - prometheus-network
  restart: unless-stopped
thanos-store
thanos-store:
  image: quay.io/thanos/thanos:latest
  container_name: thanos-store
  command:
    - store
    - "--grpc-address=0.0.0.0:10901"
    - "--http-address=0.0.0.0:10902"
    - "--data-dir=/data"
    - "--objstore.config-file=/etc/thanos/bucket.yml"
  volumes:
    - thanos-store-data:/data
    - ./bucket.yml:/etc/thanos/bucket.yml:ro
  depends_on:
    - minio
  networks:
    - prometheus-network
  restart: unless-stopped
minio
minio:
  image: minio/minio:latest
  container_name: thanos-minio
  command: server /data --console-address ":9001"
  environment:
    - MINIO_ROOT_USER=${MINIO_ACCESS_KEY}
    - MINIO_ROOT_PASSWORD=${MINIO_SECRET_KEY}
  volumes:
    - minio-data:/data
  ports:
    - "9000:9000"
    - "9001:9001"
  networks:
    - prometheus-network
  restart: unless-stopped
alertmanager
alertmanager:
  image: prom/alertmanager:latest
  container_name: alertmanager
  volumes:
    - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
    - alertmanager-data:/alertmanager
  ports:
    - "9093:9093"
  networks:
    - prometheus-network
  restart: unless-stopped
grafana
grafana:
  image: grafana/grafana:latest
  container_name: grafana
  environment:
    - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
  volumes:
    - grafana-data:/var/lib/grafana
  ports:
    - "3000:3000"
  networks:
    - prometheus-network
  restart: unless-stopped

Quick Start

terminal
1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4 prometheus1:
5 image: prom/prometheus:latest
6 container_name: prometheus1
7 command:
8 - --config.file=/etc/prometheus/prometheus.yml
9 - --storage.tsdb.path=/prometheus
10 - --storage.tsdb.retention.time=2h
11 - --storage.tsdb.min-block-duration=2h
12 - --storage.tsdb.max-block-duration=2h
13 - --web.enable-lifecycle
14 - --web.enable-admin-api
15 volumes:
16 - ./prometheus1.yml:/etc/prometheus/prometheus.yml:ro
17 - prometheus1-data:/prometheus
18 ports:
19 - "9090:9090"
20 networks:
21 - prometheus-network
22 restart: unless-stopped
23
24 thanos-sidecar1:
25 image: quay.io/thanos/thanos:latest
26 container_name: thanos-sidecar1
27 command:
28 - sidecar
29 - --tsdb.path=/prometheus
30 - --prometheus.url=http://prometheus1:9090
31 - --grpc-address=0.0.0.0:10901
32 - --http-address=0.0.0.0:10902
33 - --objstore.config-file=/etc/thanos/bucket.yml
34 volumes:
35 - prometheus1-data:/prometheus:ro
36 - ./bucket.yml:/etc/thanos/bucket.yml:ro
37 depends_on:
38 - prometheus1
39 - minio
40 networks:
41 - prometheus-network
42 restart: unless-stopped
43
44 prometheus2:
45 image: prom/prometheus:latest
46 container_name: prometheus2
47 command:
48 - --config.file=/etc/prometheus/prometheus.yml
49 - --storage.tsdb.path=/prometheus
50 - --storage.tsdb.retention.time=2h
51 - --storage.tsdb.min-block-duration=2h
52 - --storage.tsdb.max-block-duration=2h
53 - --web.enable-lifecycle
54 - --web.enable-admin-api
55 volumes:
56 - ./prometheus2.yml:/etc/prometheus/prometheus.yml:ro
57 - prometheus2-data:/prometheus
58 ports:
59 - "9091:9090"
60 networks:
61 - prometheus-network
62 restart: unless-stopped
63
64 thanos-sidecar2:
65 image: quay.io/thanos/thanos:latest
66 container_name: thanos-sidecar2
67 command:
68 - sidecar
69 - --tsdb.path=/prometheus
70 - --prometheus.url=http://prometheus2:9090
71 - --grpc-address=0.0.0.0:10901
72 - --http-address=0.0.0.0:10902
73 - --objstore.config-file=/etc/thanos/bucket.yml
74 volumes:
75 - prometheus2-data:/prometheus:ro
76 - ./bucket.yml:/etc/thanos/bucket.yml:ro
77 depends_on:
78 - prometheus2
79 - minio
80 networks:
81 - prometheus-network
82 restart: unless-stopped
83
84 thanos-query:
85 image: quay.io/thanos/thanos:latest
86 container_name: thanos-query
87 command:
88 - query
89 - --http-address=0.0.0.0:9090
90 - --grpc-address=0.0.0.0:10901
91 - --store=thanos-sidecar1:10901
92 - --store=thanos-sidecar2:10901
93 - --store=thanos-store:10901
94 - --query.replica-label=replica
95 ports:
96 - "10902:9090"
97 networks:
98 - prometheus-network
99 restart: unless-stopped
100
101 thanos-store:
102 image: quay.io/thanos/thanos:latest
103 container_name: thanos-store
104 command:
105 - store
106 - --grpc-address=0.0.0.0:10901
107 - --http-address=0.0.0.0:10902
108 - --data-dir=/data
109 - --objstore.config-file=/etc/thanos/bucket.yml
110 volumes:
111 - thanos-store-data:/data
112 - ./bucket.yml:/etc/thanos/bucket.yml:ro
113 depends_on:
114 - minio
115 networks:
116 - prometheus-network
117 restart: unless-stopped
118
119 minio:
120 image: minio/minio:latest
121 container_name: thanos-minio
122 command: server /data --console-address ":9001"
123 environment:
124 - MINIO_ROOT_USER=${MINIO_ACCESS_KEY}
125 - MINIO_ROOT_PASSWORD=${MINIO_SECRET_KEY}
126 volumes:
127 - minio-data:/data
128 ports:
129 - "9000:9000"
130 - "9001:9001"
131 networks:
132 - prometheus-network
133 restart: unless-stopped
134
135 alertmanager:
136 image: prom/alertmanager:latest
137 container_name: alertmanager
138 volumes:
139 - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
140 - alertmanager-data:/alertmanager
141 ports:
142 - "9093:9093"
143 networks:
144 - prometheus-network
145 restart: unless-stopped
146
147 grafana:
148 image: grafana/grafana:latest
149 container_name: grafana
150 environment:
151 - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
152 volumes:
153 - grafana-data:/var/lib/grafana
154 ports:
155 - "3000:3000"
156 networks:
157 - prometheus-network
158 restart: unless-stopped
159
160volumes:
161 prometheus1-data:
162 prometheus2-data:
163 thanos-store-data:
164 minio-data:
165 alertmanager-data:
166 grafana-data:
167
168networks:
169 prometheus-network:
170 driver: bridge
171EOF
172
173# 2. Create the .env file
174cat > .env << 'EOF'
175# Prometheus HA with Thanos
176GRAFANA_PASSWORD=secure_grafana_password
177MINIO_ACCESS_KEY=thanos
178MINIO_SECRET_KEY=thanos_secret
179
180# Create bucket.yml:
181# type: S3
182# config:
183# bucket: thanos
184# endpoint: minio:9000
185# access_key: thanos
186# secret_key: thanos_secret
187# insecure: true
188EOF
189
190# 3. Start the services
191docker compose up -d
192
193# 4. View logs
194docker compose logs -f

One-Liner

Run this command to download and set up the recipe in one step:

terminal
1curl -fsSL https://docker.recipes/api/recipes/prometheus-stack-ha/run | bash

Troubleshooting

  • Thanos sidecar 'connection refused' to MinIO: Verify MINIO_ACCESS_KEY and MINIO_SECRET_KEY environment variables match bucket.yml credentials
  • Prometheus 'context deadline exceeded' errors: Increase storage.tsdb.retention.time if local disk space is insufficient for 2-hour blocks
  • Thanos Query returns no data: Check that all sidecar and store --grpc-address endpoints are correctly specified in query configuration
  • MinIO bucket access denied: Ensure the bucket specified in bucket.yml exists and credentials have read/write permissions
  • Grafana cannot connect to Thanos Query: Verify Thanos Query is accessible at http://thanos-query:9090 from Grafana container
  • High memory usage in Prometheus: Reduce scrape intervals or implement metric relabeling to decrease cardinality of stored time series

Community Notes

Loading...
Loading notes...

Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

Components

prometheusthanos-sidecarthanos-querythanos-storeminioalertmanagergrafana

Tags

#prometheus#thanos#monitoring#ha#metrics

Category

Monitoring & Observability
Ad Space