docker.recipes

Thanos

advanced

Highly available Prometheus setup with long-term storage capabilities.

Overview

Thanos is a set of components that can be composed into a highly available metric system with unlimited storage capacity, which can be added on top of existing Prometheus deployments. Originally developed at Improbable, Thanos addresses Prometheus's fundamental limitations around high availability, long-term retention, and global query capabilities by providing a distributed system architecture that leverages object storage for infinite retention and horizontal scalability. This stack combines Prometheus for metrics collection with Thanos Sidecar and Query components to create a scalable monitoring solution. The Thanos Sidecar runs alongside Prometheus, uploading 2-hour TSDB blocks to object storage (configured for MinIO in the complete setup) while providing real-time access to local data. The Thanos Query component acts as a global query layer, federating data from multiple Prometheus instances and historical data from object storage, presenting a unified PromQL interface that spans across all time ranges and locations. Platform engineers and SREs managing multi-cluster Kubernetes environments, large-scale microservices, or any infrastructure requiring long-term metrics retention should consider this stack. Unlike standalone Prometheus deployments limited by local storage and single-instance constraints, this Thanos configuration enables true horizontal scaling, cross-datacenter metric queries, and cost-effective long-term storage using object storage backends, making it ideal for organizations outgrowing single Prometheus instances.

Key Features

  • Global query interface via Thanos Query for federated metrics across multiple Prometheus instances
  • Unlimited long-term storage retention using object storage backends like MinIO or cloud storage
  • Horizontal scaling of Prometheus deployments without losing query capabilities across instances
  • Automatic TSDB block upload to object storage with configurable 2-hour block duration
  • PromQL compatibility maintained across historical and real-time data through unified query layer
  • High availability setup supporting multiple Prometheus replicas with deduplication
  • Cost-effective storage tiering with hot data in Prometheus and cold data in object storage
  • Cross-cluster and multi-region metrics federation with single query endpoint

Common Use Cases

  • 1Multi-cluster Kubernetes monitoring with centralized metrics querying across all clusters
  • 2Long-term capacity planning and trend analysis requiring years of historical metrics data
  • 3Highly available Prometheus setups with multiple replicas and automatic failover capabilities
  • 4Cost optimization for metrics storage by moving historical data to cheaper object storage
  • 5Cross-datacenter monitoring with unified dashboards spanning multiple geographic regions
  • 6Compliance requirements demanding long-term metrics retention for auditing purposes
  • 7Large-scale microservices monitoring where single Prometheus instances hit storage limits

Prerequisites

  • Minimum 2GB RAM available for Prometheus and Thanos components combined
  • Object storage backend (MinIO, S3, or GCS) for long-term block storage configured separately
  • Ports 9090 (Prometheus), 10901 (Thanos Sidecar), and 10902 (Thanos Query) available
  • Understanding of PromQL and Prometheus architecture for effective troubleshooting
  • Knowledge of TSDB block structure and compaction processes for storage optimization
  • Sufficient disk space for local Prometheus storage before blocks are uploaded to object storage

For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

docker-compose.yml

docker-compose.yml
1services:
2 prometheus:
3 image: prom/prometheus:latest
4 container_name: prometheus
5 restart: unless-stopped
6 command:
7 - --config.file=/etc/prometheus/prometheus.yml
8 - --storage.tsdb.path=/prometheus
9 - --storage.tsdb.max-block-duration=2h
10 - --storage.tsdb.min-block-duration=2h
11 volumes:
12 - ./prometheus:/etc/prometheus
13 - prometheus_data:/prometheus
14 ports:
15 - "9090:9090"
16 networks:
17 - thanos-network
18
19 thanos-sidecar:
20 image: quay.io/thanos/thanos:latest
21 container_name: thanos-sidecar
22 command:
23 - sidecar
24 - --prometheus.url=http://prometheus:9090
25 - --tsdb.path=/prometheus
26 volumes:
27 - prometheus_data:/prometheus
28 depends_on:
29 - prometheus
30 networks:
31 - thanos-network
32
33 thanos-query:
34 image: quay.io/thanos/thanos:latest
35 container_name: thanos-query
36 command:
37 - query
38 - --endpoint=thanos-sidecar:10901
39 ports:
40 - "10902:10902"
41 depends_on:
42 - thanos-sidecar
43 networks:
44 - thanos-network
45
46volumes:
47 prometheus_data:
48
49networks:
50 thanos-network:
51 driver: bridge

.env Template

.env
1# Thanos configuration

Usage Notes

  1. 1Docs: https://thanos.io/tip/thanos/getting-started.md/
  2. 2Thanos Query UI at http://localhost:10902 - global view of metrics
  3. 3Prometheus at http://localhost:9090 - local data only
  4. 4Sidecar uploads blocks to object storage for long-term retention
  5. 5Add Store Gateway for querying historical data from object storage
  6. 6Supports multiple Prometheus instances for HA and federation

Individual Services(3 services)

Copy individual services to mix and match with your existing compose files.

prometheus
prometheus:
  image: prom/prometheus:latest
  container_name: prometheus
  restart: unless-stopped
  command:
    - "--config.file=/etc/prometheus/prometheus.yml"
    - "--storage.tsdb.path=/prometheus"
    - "--storage.tsdb.max-block-duration=2h"
    - "--storage.tsdb.min-block-duration=2h"
  volumes:
    - ./prometheus:/etc/prometheus
    - prometheus_data:/prometheus
  ports:
    - "9090:9090"
  networks:
    - thanos-network
thanos-sidecar
thanos-sidecar:
  image: quay.io/thanos/thanos:latest
  container_name: thanos-sidecar
  command:
    - sidecar
    - "--prometheus.url=http://prometheus:9090"
    - "--tsdb.path=/prometheus"
  volumes:
    - prometheus_data:/prometheus
  depends_on:
    - prometheus
  networks:
    - thanos-network
thanos-query
thanos-query:
  image: quay.io/thanos/thanos:latest
  container_name: thanos-query
  command:
    - query
    - "--endpoint=thanos-sidecar:10901"
  ports:
    - "10902:10902"
  depends_on:
    - thanos-sidecar
  networks:
    - thanos-network

Quick Start

terminal
1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4 prometheus:
5 image: prom/prometheus:latest
6 container_name: prometheus
7 restart: unless-stopped
8 command:
9 - --config.file=/etc/prometheus/prometheus.yml
10 - --storage.tsdb.path=/prometheus
11 - --storage.tsdb.max-block-duration=2h
12 - --storage.tsdb.min-block-duration=2h
13 volumes:
14 - ./prometheus:/etc/prometheus
15 - prometheus_data:/prometheus
16 ports:
17 - "9090:9090"
18 networks:
19 - thanos-network
20
21 thanos-sidecar:
22 image: quay.io/thanos/thanos:latest
23 container_name: thanos-sidecar
24 command:
25 - sidecar
26 - --prometheus.url=http://prometheus:9090
27 - --tsdb.path=/prometheus
28 volumes:
29 - prometheus_data:/prometheus
30 depends_on:
31 - prometheus
32 networks:
33 - thanos-network
34
35 thanos-query:
36 image: quay.io/thanos/thanos:latest
37 container_name: thanos-query
38 command:
39 - query
40 - --endpoint=thanos-sidecar:10901
41 ports:
42 - "10902:10902"
43 depends_on:
44 - thanos-sidecar
45 networks:
46 - thanos-network
47
48volumes:
49 prometheus_data:
50
51networks:
52 thanos-network:
53 driver: bridge
54EOF
55
56# 2. Create the .env file
57cat > .env << 'EOF'
58# Thanos configuration
59EOF
60
61# 3. Start the services
62docker compose up -d
63
64# 4. View logs
65docker compose logs -f

One-Liner

Run this command to download and set up the recipe in one step:

terminal
1curl -fsSL https://docker.recipes/api/recipes/thanos/run | bash

Troubleshooting

  • Thanos Sidecar fails to start: Ensure prometheus_data volume has correct permissions and Prometheus is running with TSDB block duration set to 2h
  • Query returns incomplete data: Verify Thanos Sidecar is properly connected and blocks are being uploaded to object storage successfully
  • High memory usage in Thanos Query: Increase memory limits and check for large query ranges that might be loading too much historical data
  • Prometheus blocks not uploading: Check object storage configuration and network connectivity between Sidecar and storage backend
  • Thanos Query endpoint timeouts: Verify Sidecar gRPC endpoint is accessible on port 10901 and not blocked by firewall rules
  • Duplicate metrics in query results: Configure proper deduplication labels in Thanos Query and ensure Prometheus external labels are set correctly

Community Notes

Loading...
Loading notes...

Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

Ad Space