01Why Monitoring Matters for Self-Hosters
Running services without monitoring is like driving without a dashboard — you have no idea how fast you're going, how much fuel you have left, or whether the engine is about to overheat. I learned this when my Nextcloud instance quietly ran out of disk space over a weekend, corrupting several files before I noticed on Monday morning.
Since then, I've run a Prometheus + Grafana monitoring stack alongside every service I deploy. It's caught disk issues, memory leaks, certificate expirations, and performance degradation before they became real problems. The setup takes about an hour and runs on minimal resources.
This guide walks you through setting up a complete monitoring stack with Docker Compose, including pre-built dashboards and meaningful alerts.
02Understanding the Monitoring Stack
The modern monitoring stack has three layers:
Collection: Prometheus scrapes metrics from your services every 15 seconds. It pulls data from exporters — small programs that expose metrics in a standard format. There are exporters for almost everything: Node Exporter for system metrics, cAdvisor for container metrics, Blackbox Exporter for endpoint probing.
Visualization: Grafana turns raw metrics into beautiful dashboards. It queries Prometheus and renders graphs, gauges, tables, and heatmaps. The community has created thousands of pre-built dashboards you can import with one click.
Alerting: AlertManager or Grafana Alerting sends notifications when things go wrong. You define rules like "alert me if disk usage exceeds 85%" and get notified via Telegram, Discord, Slack, or email.
All three components run as Docker containers and work together seamlessly.
03Setting Up the Stack
Here's a minimal monitoring stack that gives you system metrics, container metrics, and a Grafana dashboard:
[docker-compose.yml]
1services: 2 prometheus: 3 image: prom/prometheus:v2.53.04 container_name: prometheus5 restart: unless-stopped6 volumes: 7 - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro8 - prometheus_data:/prometheus9 command: 10 - "--config.file=/etc/prometheus/prometheus.yml"11 - "--storage.tsdb.retention.time=30d"12 ports: 13 - "9090:9090"1415 grafana: 16 image: grafana/grafana:11.1.017 container_name: grafana18 restart: unless-stopped19 volumes: 20 - grafana_data:/var/lib/grafana21 environment: 22 - GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}23 ports: 24 - "3000:3000"2526 node-exporter: 27 image: prom/node-exporter:v1.8.128 container_name: node-exporter29 restart: unless-stopped30 pid: host31 volumes: 32 - /proc:/host/proc:ro33 - /sys:/host/sys:ro34 - /:/rootfs:ro3536 cadvisor: 37 image: gcr.io/cadvisor/cadvisor:v0.49.138 container_name: cadvisor39 restart: unless-stopped40 privileged: true41 volumes: 42 - /:/rootfs:ro43 - /var/run:/var/run:ro44 - /sys:/sys:ro45 - /var/lib/docker/:/var/lib/docker:ro4647volumes: 48 prometheus_data: 49 grafana_data: Start with 30 days of data retention. Prometheus is efficient — even with dozens of metrics scraped every 15 seconds, a month of data typically uses less than 1GB.
04Essential Dashboards
After starting the stack, open Grafana at http://localhost:3000, add Prometheus as a data source (URL: http://prometheus:9090), and import these community dashboards by ID:
Node Exporter Full (ID: 1860): Shows CPU, memory, disk, network, and dozens of other system metrics. The most popular dashboard on Grafana.com for good reason.
Docker Container Monitoring (ID: 893): Per-container CPU, memory, network, and disk usage. See at a glance which containers are consuming the most resources.
To import: click + in Grafana, select "Import dashboard," enter the ID, select your Prometheus data source, and click Import.
I also recommend creating a custom "Overview" dashboard with the metrics you care about most. Mine shows total CPU usage, available disk space, container count, and uptime for critical services. Browse our monitoring recipes for complete configurations with pre-built dashboards.
05Setting Up Meaningful Alerts
The key to good alerting is being selective. Alert on things that require action. My recommended starter alerts:
- Disk space below 15% on any volume
- Any container restarting more than 3 times in 5 minutes
- Host memory usage above 90% for 10+ minutes
- SSL certificate expiring within 14 days
- Any HTTP endpoint returning non-200 for 2+ minutes
For personal projects, Telegram notifications are my favorite — alerts show up on my phone instantly. Configure through Grafana's built-in alerting, which supports dozens of notification channels.
Avoid alert fatigue. If you get more than a few alerts per week, your thresholds are too aggressive. Each alert should represent something you actually need to investigate.