Thanos
Highly available Prometheus setup with long-term storage capabilities.
Overview
Thanos is a set of components that can be composed into a highly available metric system with unlimited storage capacity, which can be added on top of existing Prometheus deployments. Originally developed at Improbable, Thanos addresses Prometheus's fundamental limitations around high availability, long-term retention, and global query capabilities by providing a distributed system architecture that leverages object storage for infinite retention and horizontal scalability.
This stack combines Prometheus for metrics collection with Thanos Sidecar and Query components to create a scalable monitoring solution. The Thanos Sidecar runs alongside Prometheus, uploading 2-hour TSDB blocks to object storage (configured for MinIO in the complete setup) while providing real-time access to local data. The Thanos Query component acts as a global query layer, federating data from multiple Prometheus instances and historical data from object storage, presenting a unified PromQL interface that spans across all time ranges and locations.
Platform engineers and SREs managing multi-cluster Kubernetes environments, large-scale microservices, or any infrastructure requiring long-term metrics retention should consider this stack. Unlike standalone Prometheus deployments limited by local storage and single-instance constraints, this Thanos configuration enables true horizontal scaling, cross-datacenter metric queries, and cost-effective long-term storage using object storage backends, making it ideal for organizations outgrowing single Prometheus instances.
Key Features
- Global query interface via Thanos Query for federated metrics across multiple Prometheus instances
- Unlimited long-term storage retention using object storage backends like MinIO or cloud storage
- Horizontal scaling of Prometheus deployments without losing query capabilities across instances
- Automatic TSDB block upload to object storage with configurable 2-hour block duration
- PromQL compatibility maintained across historical and real-time data through unified query layer
- High availability setup supporting multiple Prometheus replicas with deduplication
- Cost-effective storage tiering with hot data in Prometheus and cold data in object storage
- Cross-cluster and multi-region metrics federation with single query endpoint
Common Use Cases
- 1Multi-cluster Kubernetes monitoring with centralized metrics querying across all clusters
- 2Long-term capacity planning and trend analysis requiring years of historical metrics data
- 3Highly available Prometheus setups with multiple replicas and automatic failover capabilities
- 4Cost optimization for metrics storage by moving historical data to cheaper object storage
- 5Cross-datacenter monitoring with unified dashboards spanning multiple geographic regions
- 6Compliance requirements demanding long-term metrics retention for auditing purposes
- 7Large-scale microservices monitoring where single Prometheus instances hit storage limits
Prerequisites
- Minimum 2GB RAM available for Prometheus and Thanos components combined
- Object storage backend (MinIO, S3, or GCS) for long-term block storage configured separately
- Ports 9090 (Prometheus), 10901 (Thanos Sidecar), and 10902 (Thanos Query) available
- Understanding of PromQL and Prometheus architecture for effective troubleshooting
- Knowledge of TSDB block structure and compaction processes for storage optimization
- Sufficient disk space for local Prometheus storage before blocks are uploaded to object storage
For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms
docker-compose.yml
docker-compose.yml
1services: 2 prometheus: 3 image: prom/prometheus:latest4 container_name: prometheus5 restart: unless-stopped6 command: 7 - --config.file=/etc/prometheus/prometheus.yml8 - --storage.tsdb.path=/prometheus9 - --storage.tsdb.max-block-duration=2h10 - --storage.tsdb.min-block-duration=2h11 volumes: 12 - ./prometheus:/etc/prometheus13 - prometheus_data:/prometheus14 ports: 15 - "9090:9090"16 networks: 17 - thanos-network1819 thanos-sidecar: 20 image: quay.io/thanos/thanos:latest21 container_name: thanos-sidecar22 command: 23 - sidecar24 - --prometheus.url=http://prometheus:909025 - --tsdb.path=/prometheus26 volumes: 27 - prometheus_data:/prometheus28 depends_on: 29 - prometheus30 networks: 31 - thanos-network3233 thanos-query: 34 image: quay.io/thanos/thanos:latest35 container_name: thanos-query36 command: 37 - query38 - --endpoint=thanos-sidecar:1090139 ports: 40 - "10902:10902"41 depends_on: 42 - thanos-sidecar43 networks: 44 - thanos-network4546volumes: 47 prometheus_data: 4849networks: 50 thanos-network: 51 driver: bridge.env Template
.env
1# Thanos configurationUsage Notes
- 1Docs: https://thanos.io/tip/thanos/getting-started.md/
- 2Thanos Query UI at http://localhost:10902 - global view of metrics
- 3Prometheus at http://localhost:9090 - local data only
- 4Sidecar uploads blocks to object storage for long-term retention
- 5Add Store Gateway for querying historical data from object storage
- 6Supports multiple Prometheus instances for HA and federation
Individual Services(3 services)
Copy individual services to mix and match with your existing compose files.
prometheus
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: unless-stopped
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.max-block-duration=2h"
- "--storage.tsdb.min-block-duration=2h"
volumes:
- ./prometheus:/etc/prometheus
- prometheus_data:/prometheus
ports:
- "9090:9090"
networks:
- thanos-network
thanos-sidecar
thanos-sidecar:
image: quay.io/thanos/thanos:latest
container_name: thanos-sidecar
command:
- sidecar
- "--prometheus.url=http://prometheus:9090"
- "--tsdb.path=/prometheus"
volumes:
- prometheus_data:/prometheus
depends_on:
- prometheus
networks:
- thanos-network
thanos-query
thanos-query:
image: quay.io/thanos/thanos:latest
container_name: thanos-query
command:
- query
- "--endpoint=thanos-sidecar:10901"
ports:
- "10902:10902"
depends_on:
- thanos-sidecar
networks:
- thanos-network
Quick Start
terminal
1# 1. Create the compose file2cat > docker-compose.yml << 'EOF'3services:4 prometheus:5 image: prom/prometheus:latest6 container_name: prometheus7 restart: unless-stopped8 command:9 - --config.file=/etc/prometheus/prometheus.yml10 - --storage.tsdb.path=/prometheus11 - --storage.tsdb.max-block-duration=2h12 - --storage.tsdb.min-block-duration=2h13 volumes:14 - ./prometheus:/etc/prometheus15 - prometheus_data:/prometheus16 ports:17 - "9090:9090"18 networks:19 - thanos-network2021 thanos-sidecar:22 image: quay.io/thanos/thanos:latest23 container_name: thanos-sidecar24 command:25 - sidecar26 - --prometheus.url=http://prometheus:909027 - --tsdb.path=/prometheus28 volumes:29 - prometheus_data:/prometheus30 depends_on:31 - prometheus32 networks:33 - thanos-network3435 thanos-query:36 image: quay.io/thanos/thanos:latest37 container_name: thanos-query38 command:39 - query40 - --endpoint=thanos-sidecar:1090141 ports:42 - "10902:10902"43 depends_on:44 - thanos-sidecar45 networks:46 - thanos-network4748volumes:49 prometheus_data:5051networks:52 thanos-network:53 driver: bridge54EOF5556# 2. Create the .env file57cat > .env << 'EOF'58# Thanos configuration59EOF6061# 3. Start the services62docker compose up -d6364# 4. View logs65docker compose logs -fOne-Liner
Run this command to download and set up the recipe in one step:
terminal
1curl -fsSL https://docker.recipes/api/recipes/thanos/run | bashTroubleshooting
- Thanos Sidecar fails to start: Ensure prometheus_data volume has correct permissions and Prometheus is running with TSDB block duration set to 2h
- Query returns incomplete data: Verify Thanos Sidecar is properly connected and blocks are being uploaded to object storage successfully
- High memory usage in Thanos Query: Increase memory limits and check for large query ranges that might be loading too much historical data
- Prometheus blocks not uploading: Check object storage configuration and network connectivity between Sidecar and storage backend
- Thanos Query endpoint timeouts: Verify Sidecar gRPC endpoint is accessible on port 10901 and not blocked by firewall rules
- Duplicate metrics in query results: Configure proper deduplication labels in Thanos Query and ensure Prometheus external labels are set correctly
Community Notes
Loading...
Loading notes...
Download Recipe Kit
Get all files in a ready-to-deploy package
Includes docker-compose.yml, .env template, README, and license
Components
thanosprometheusminio
Tags
#thanos#prometheus#ha#long-term-storage#metrics
Category
Monitoring & ObservabilityAd Space
Shortcuts: C CopyF FavoriteD Download