Prometheus HA Stack
High-availability Prometheus setup with Thanos for long-term storage.
Overview
Prometheus is an open-source monitoring and alerting toolkit that revolutionized the way organizations monitor their infrastructure and applications. Originally developed at SoundCloud in 2012, Prometheus pioneered the pull-based metrics collection model and introduced PromQL, a powerful query language for time-series data. Its dimensional data model using labels allows for flexible aggregation and filtering of metrics across complex, dynamic environments like Kubernetes clusters and microservices architectures.
This high-availability stack combines Prometheus with Thanos to solve the fundamental limitations of standalone Prometheus deployments: limited storage capacity, lack of global querying across instances, and single points of failure. Thanos sidecars run alongside each Prometheus instance, uploading data blocks to MinIO object storage while providing a unified query interface through Thanos Query. The store gateway enables querying of historical data directly from object storage, while Alertmanager handles notification routing and Grafana provides rich visualization capabilities across the entire monitoring ecosystem.
This configuration is essential for organizations running multiple Prometheus instances in production environments who need unlimited retention, global queries across data centers, and true high availability. Platform engineering teams, SRE organizations, and enterprises with compliance requirements for long-term metrics retention will find this stack indispensable for maintaining observability at scale while avoiding the operational complexity and cost of vendor-managed solutions.
Key Features
- Dual Prometheus instances with Thanos sidecars for true high availability and zero data loss
- Unlimited metrics retention through Thanos store gateway and MinIO object storage
- Global querying across all Prometheus instances via Thanos Query with deduplication
- S3-compatible object storage with MinIO for cost-effective long-term data persistence
- Automatic block compaction and downsampling for efficient storage utilization
- Replica label deduplication to handle identical metrics from multiple Prometheus instances
- Integrated Alertmanager for centralized alert routing and notification management
- Grafana integration with Thanos Query for unified dashboards across all data sources
Common Use Cases
- 1Multi-region Kubernetes monitoring with global metrics visibility across clusters
- 2Enterprise infrastructure monitoring requiring years of metrics retention for compliance
- 3Platform engineering teams managing dozens of microservices across multiple environments
- 4Organizations migrating from vendor solutions like DataDog to reduce monitoring costs
- 5High-traffic SaaS applications needing reliable monitoring with zero data loss tolerance
- 6Financial services requiring long-term metrics storage for regulatory compliance audits
- 7Research institutions analyzing historical performance trends over extended periods
Prerequisites
- Docker host with minimum 6GB RAM to support all components (Prometheus instances need 1GB+ each)
- At least 20GB available storage for MinIO object storage and local Prometheus data
- Network access to ports 3000, 9000-9001, 9090-9093, and 10902 for service interfaces
- Understanding of PromQL query language and Prometheus configuration syntax
- Familiarity with time-series data concepts and metrics cardinality considerations
- Basic knowledge of object storage concepts and S3-compatible APIs for troubleshooting
For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms
docker-compose.yml
docker-compose.yml
1services: 2 prometheus1: 3 image: prom/prometheus:latest4 container_name: prometheus15 command: 6 - --config.file=/etc/prometheus/prometheus.yml7 - --storage.tsdb.path=/prometheus8 - --storage.tsdb.retention.time=2h9 - --storage.tsdb.min-block-duration=2h10 - --storage.tsdb.max-block-duration=2h11 - --web.enable-lifecycle12 - --web.enable-admin-api13 volumes: 14 - ./prometheus1.yml:/etc/prometheus/prometheus.yml:ro15 - prometheus1-data:/prometheus16 ports: 17 - "9090:9090"18 networks: 19 - prometheus-network20 restart: unless-stopped2122 thanos-sidecar1: 23 image: quay.io/thanos/thanos:latest24 container_name: thanos-sidecar125 command: 26 - sidecar27 - --tsdb.path=/prometheus28 - --prometheus.url=http://prometheus1:909029 - --grpc-address=0.0.0.0:1090130 - --http-address=0.0.0.0:1090231 - --objstore.config-file=/etc/thanos/bucket.yml32 volumes: 33 - prometheus1-data:/prometheus:ro34 - ./bucket.yml:/etc/thanos/bucket.yml:ro35 depends_on: 36 - prometheus137 - minio38 networks: 39 - prometheus-network40 restart: unless-stopped4142 prometheus2: 43 image: prom/prometheus:latest44 container_name: prometheus245 command: 46 - --config.file=/etc/prometheus/prometheus.yml47 - --storage.tsdb.path=/prometheus48 - --storage.tsdb.retention.time=2h49 - --storage.tsdb.min-block-duration=2h50 - --storage.tsdb.max-block-duration=2h51 - --web.enable-lifecycle52 - --web.enable-admin-api53 volumes: 54 - ./prometheus2.yml:/etc/prometheus/prometheus.yml:ro55 - prometheus2-data:/prometheus56 ports: 57 - "9091:9090"58 networks: 59 - prometheus-network60 restart: unless-stopped6162 thanos-sidecar2: 63 image: quay.io/thanos/thanos:latest64 container_name: thanos-sidecar265 command: 66 - sidecar67 - --tsdb.path=/prometheus68 - --prometheus.url=http://prometheus2:909069 - --grpc-address=0.0.0.0:1090170 - --http-address=0.0.0.0:1090271 - --objstore.config-file=/etc/thanos/bucket.yml72 volumes: 73 - prometheus2-data:/prometheus:ro74 - ./bucket.yml:/etc/thanos/bucket.yml:ro75 depends_on: 76 - prometheus277 - minio78 networks: 79 - prometheus-network80 restart: unless-stopped8182 thanos-query: 83 image: quay.io/thanos/thanos:latest84 container_name: thanos-query85 command: 86 - query87 - --http-address=0.0.0.0:909088 - --grpc-address=0.0.0.0:1090189 - --store=thanos-sidecar1:1090190 - --store=thanos-sidecar2:1090191 - --store=thanos-store:1090192 - --query.replica-label=replica93 ports: 94 - "10902:9090"95 networks: 96 - prometheus-network97 restart: unless-stopped9899 thanos-store: 100 image: quay.io/thanos/thanos:latest101 container_name: thanos-store102 command: 103 - store104 - --grpc-address=0.0.0.0:10901105 - --http-address=0.0.0.0:10902106 - --data-dir=/data107 - --objstore.config-file=/etc/thanos/bucket.yml108 volumes: 109 - thanos-store-data:/data110 - ./bucket.yml:/etc/thanos/bucket.yml:ro111 depends_on: 112 - minio113 networks: 114 - prometheus-network115 restart: unless-stopped116117 minio: 118 image: minio/minio:latest119 container_name: thanos-minio120 command: server /data --console-address ":9001"121 environment: 122 - MINIO_ROOT_USER=${MINIO_ACCESS_KEY}123 - MINIO_ROOT_PASSWORD=${MINIO_SECRET_KEY}124 volumes: 125 - minio-data:/data126 ports: 127 - "9000:9000"128 - "9001:9001"129 networks: 130 - prometheus-network131 restart: unless-stopped132133 alertmanager: 134 image: prom/alertmanager:latest135 container_name: alertmanager136 volumes: 137 - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro138 - alertmanager-data:/alertmanager139 ports: 140 - "9093:9093"141 networks: 142 - prometheus-network143 restart: unless-stopped144145 grafana: 146 image: grafana/grafana:latest147 container_name: grafana148 environment: 149 - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}150 volumes: 151 - grafana-data:/var/lib/grafana152 ports: 153 - "3000:3000"154 networks: 155 - prometheus-network156 restart: unless-stopped157158volumes: 159 prometheus1-data: 160 prometheus2-data: 161 thanos-store-data: 162 minio-data: 163 alertmanager-data: 164 grafana-data: 165166networks: 167 prometheus-network: 168 driver: bridge.env Template
.env
1# Prometheus HA with Thanos2GRAFANA_PASSWORD=secure_grafana_password3MINIO_ACCESS_KEY=thanos4MINIO_SECRET_KEY=thanos_secret56# Create bucket.yml:7# type: S38# config:9# bucket: thanos10# endpoint: minio:900011# access_key: thanos12# secret_key: thanos_secret13# insecure: trueUsage Notes
- 1Thanos Query at http://localhost:10902
- 2Grafana at http://localhost:3000
- 3Alertmanager at http://localhost:9093
- 4Long-term storage in MinIO
- 5Global view across Prometheus instances
Individual Services(9 services)
Copy individual services to mix and match with your existing compose files.
prometheus1
prometheus1:
image: prom/prometheus:latest
container_name: prometheus1
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention.time=2h"
- "--storage.tsdb.min-block-duration=2h"
- "--storage.tsdb.max-block-duration=2h"
- "--web.enable-lifecycle"
- "--web.enable-admin-api"
volumes:
- ./prometheus1.yml:/etc/prometheus/prometheus.yml:ro
- prometheus1-data:/prometheus
ports:
- "9090:9090"
networks:
- prometheus-network
restart: unless-stopped
thanos-sidecar1
thanos-sidecar1:
image: quay.io/thanos/thanos:latest
container_name: thanos-sidecar1
command:
- sidecar
- "--tsdb.path=/prometheus"
- "--prometheus.url=http://prometheus1:9090"
- "--grpc-address=0.0.0.0:10901"
- "--http-address=0.0.0.0:10902"
- "--objstore.config-file=/etc/thanos/bucket.yml"
volumes:
- prometheus1-data:/prometheus:ro
- ./bucket.yml:/etc/thanos/bucket.yml:ro
depends_on:
- prometheus1
- minio
networks:
- prometheus-network
restart: unless-stopped
prometheus2
prometheus2:
image: prom/prometheus:latest
container_name: prometheus2
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention.time=2h"
- "--storage.tsdb.min-block-duration=2h"
- "--storage.tsdb.max-block-duration=2h"
- "--web.enable-lifecycle"
- "--web.enable-admin-api"
volumes:
- ./prometheus2.yml:/etc/prometheus/prometheus.yml:ro
- prometheus2-data:/prometheus
ports:
- "9091:9090"
networks:
- prometheus-network
restart: unless-stopped
thanos-sidecar2
thanos-sidecar2:
image: quay.io/thanos/thanos:latest
container_name: thanos-sidecar2
command:
- sidecar
- "--tsdb.path=/prometheus"
- "--prometheus.url=http://prometheus2:9090"
- "--grpc-address=0.0.0.0:10901"
- "--http-address=0.0.0.0:10902"
- "--objstore.config-file=/etc/thanos/bucket.yml"
volumes:
- prometheus2-data:/prometheus:ro
- ./bucket.yml:/etc/thanos/bucket.yml:ro
depends_on:
- prometheus2
- minio
networks:
- prometheus-network
restart: unless-stopped
thanos-query
thanos-query:
image: quay.io/thanos/thanos:latest
container_name: thanos-query
command:
- query
- "--http-address=0.0.0.0:9090"
- "--grpc-address=0.0.0.0:10901"
- "--store=thanos-sidecar1:10901"
- "--store=thanos-sidecar2:10901"
- "--store=thanos-store:10901"
- "--query.replica-label=replica"
ports:
- "10902:9090"
networks:
- prometheus-network
restart: unless-stopped
thanos-store
thanos-store:
image: quay.io/thanos/thanos:latest
container_name: thanos-store
command:
- store
- "--grpc-address=0.0.0.0:10901"
- "--http-address=0.0.0.0:10902"
- "--data-dir=/data"
- "--objstore.config-file=/etc/thanos/bucket.yml"
volumes:
- thanos-store-data:/data
- ./bucket.yml:/etc/thanos/bucket.yml:ro
depends_on:
- minio
networks:
- prometheus-network
restart: unless-stopped
minio
minio:
image: minio/minio:latest
container_name: thanos-minio
command: server /data --console-address ":9001"
environment:
- MINIO_ROOT_USER=${MINIO_ACCESS_KEY}
- MINIO_ROOT_PASSWORD=${MINIO_SECRET_KEY}
volumes:
- minio-data:/data
ports:
- "9000:9000"
- "9001:9001"
networks:
- prometheus-network
restart: unless-stopped
alertmanager
alertmanager:
image: prom/alertmanager:latest
container_name: alertmanager
volumes:
- ./alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
- alertmanager-data:/alertmanager
ports:
- "9093:9093"
networks:
- prometheus-network
restart: unless-stopped
grafana
grafana:
image: grafana/grafana:latest
container_name: grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
volumes:
- grafana-data:/var/lib/grafana
ports:
- "3000:3000"
networks:
- prometheus-network
restart: unless-stopped
Quick Start
terminal
1# 1. Create the compose file2cat > docker-compose.yml << 'EOF'3services:4 prometheus1:5 image: prom/prometheus:latest6 container_name: prometheus17 command:8 - --config.file=/etc/prometheus/prometheus.yml9 - --storage.tsdb.path=/prometheus10 - --storage.tsdb.retention.time=2h11 - --storage.tsdb.min-block-duration=2h12 - --storage.tsdb.max-block-duration=2h13 - --web.enable-lifecycle14 - --web.enable-admin-api15 volumes:16 - ./prometheus1.yml:/etc/prometheus/prometheus.yml:ro17 - prometheus1-data:/prometheus18 ports:19 - "9090:9090"20 networks:21 - prometheus-network22 restart: unless-stopped2324 thanos-sidecar1:25 image: quay.io/thanos/thanos:latest26 container_name: thanos-sidecar127 command:28 - sidecar29 - --tsdb.path=/prometheus30 - --prometheus.url=http://prometheus1:909031 - --grpc-address=0.0.0.0:1090132 - --http-address=0.0.0.0:1090233 - --objstore.config-file=/etc/thanos/bucket.yml34 volumes:35 - prometheus1-data:/prometheus:ro36 - ./bucket.yml:/etc/thanos/bucket.yml:ro37 depends_on:38 - prometheus139 - minio40 networks:41 - prometheus-network42 restart: unless-stopped4344 prometheus2:45 image: prom/prometheus:latest46 container_name: prometheus247 command:48 - --config.file=/etc/prometheus/prometheus.yml49 - --storage.tsdb.path=/prometheus50 - --storage.tsdb.retention.time=2h51 - --storage.tsdb.min-block-duration=2h52 - --storage.tsdb.max-block-duration=2h53 - --web.enable-lifecycle54 - --web.enable-admin-api55 volumes:56 - ./prometheus2.yml:/etc/prometheus/prometheus.yml:ro57 - prometheus2-data:/prometheus58 ports:59 - "9091:9090"60 networks:61 - prometheus-network62 restart: unless-stopped6364 thanos-sidecar2:65 image: quay.io/thanos/thanos:latest66 container_name: thanos-sidecar267 command:68 - sidecar69 - --tsdb.path=/prometheus70 - --prometheus.url=http://prometheus2:909071 - --grpc-address=0.0.0.0:1090172 - --http-address=0.0.0.0:1090273 - --objstore.config-file=/etc/thanos/bucket.yml74 volumes:75 - prometheus2-data:/prometheus:ro76 - ./bucket.yml:/etc/thanos/bucket.yml:ro77 depends_on:78 - prometheus279 - minio80 networks:81 - prometheus-network82 restart: unless-stopped8384 thanos-query:85 image: quay.io/thanos/thanos:latest86 container_name: thanos-query87 command:88 - query89 - --http-address=0.0.0.0:909090 - --grpc-address=0.0.0.0:1090191 - --store=thanos-sidecar1:1090192 - --store=thanos-sidecar2:1090193 - --store=thanos-store:1090194 - --query.replica-label=replica95 ports:96 - "10902:9090"97 networks:98 - prometheus-network99 restart: unless-stopped100101 thanos-store:102 image: quay.io/thanos/thanos:latest103 container_name: thanos-store104 command:105 - store106 - --grpc-address=0.0.0.0:10901107 - --http-address=0.0.0.0:10902108 - --data-dir=/data109 - --objstore.config-file=/etc/thanos/bucket.yml110 volumes:111 - thanos-store-data:/data112 - ./bucket.yml:/etc/thanos/bucket.yml:ro113 depends_on:114 - minio115 networks:116 - prometheus-network117 restart: unless-stopped118119 minio:120 image: minio/minio:latest121 container_name: thanos-minio122 command: server /data --console-address ":9001"123 environment:124 - MINIO_ROOT_USER=${MINIO_ACCESS_KEY}125 - MINIO_ROOT_PASSWORD=${MINIO_SECRET_KEY}126 volumes:127 - minio-data:/data128 ports:129 - "9000:9000"130 - "9001:9001"131 networks:132 - prometheus-network133 restart: unless-stopped134135 alertmanager:136 image: prom/alertmanager:latest137 container_name: alertmanager138 volumes:139 - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro140 - alertmanager-data:/alertmanager141 ports:142 - "9093:9093"143 networks:144 - prometheus-network145 restart: unless-stopped146147 grafana:148 image: grafana/grafana:latest149 container_name: grafana150 environment:151 - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}152 volumes:153 - grafana-data:/var/lib/grafana154 ports:155 - "3000:3000"156 networks:157 - prometheus-network158 restart: unless-stopped159160volumes:161 prometheus1-data:162 prometheus2-data:163 thanos-store-data:164 minio-data:165 alertmanager-data:166 grafana-data:167168networks:169 prometheus-network:170 driver: bridge171EOF172173# 2. Create the .env file174cat > .env << 'EOF'175# Prometheus HA with Thanos176GRAFANA_PASSWORD=secure_grafana_password177MINIO_ACCESS_KEY=thanos178MINIO_SECRET_KEY=thanos_secret179180# Create bucket.yml:181# type: S3182# config:183# bucket: thanos184# endpoint: minio:9000185# access_key: thanos186# secret_key: thanos_secret187# insecure: true188EOF189190# 3. Start the services191docker compose up -d192193# 4. View logs194docker compose logs -fOne-Liner
Run this command to download and set up the recipe in one step:
terminal
1curl -fsSL https://docker.recipes/api/recipes/prometheus-stack-ha/run | bashTroubleshooting
- Thanos sidecar 'connection refused' to MinIO: Verify MINIO_ACCESS_KEY and MINIO_SECRET_KEY environment variables match bucket.yml credentials
- Prometheus 'context deadline exceeded' errors: Increase storage.tsdb.retention.time if local disk space is insufficient for 2-hour blocks
- Thanos Query returns no data: Check that all sidecar and store --grpc-address endpoints are correctly specified in query configuration
- MinIO bucket access denied: Ensure the bucket specified in bucket.yml exists and credentials have read/write permissions
- Grafana cannot connect to Thanos Query: Verify Thanos Query is accessible at http://thanos-query:9090 from Grafana container
- High memory usage in Prometheus: Reduce scrape intervals or implement metric relabeling to decrease cardinality of stored time series
Community Notes
Loading...
Loading notes...
Download Recipe Kit
Get all files in a ready-to-deploy package
Includes docker-compose.yml, .env template, README, and license
Components
prometheusthanos-sidecarthanos-querythanos-storeminioalertmanagergrafana
Tags
#prometheus#thanos#monitoring#ha#metrics
Category
Monitoring & ObservabilityAd Space
Shortcuts: C CopyF FavoriteD Download