docker.recipes

Apache Druid Analytics

advanced

Druid for real-time analytics with ZooKeeper.

Overview

Apache Druid is a high-performance, column-oriented, distributed data store designed for fast slice-and-dice analytics on large datasets. Originally developed at Metamarkets and later open-sourced, Druid excels at ingesting streaming data and providing sub-second query responses for OLAP workloads, making it ideal for real-time dashboards, user-facing analytics, and interactive data exploration. Unlike traditional databases, Druid pre-aggregates data at ingestion time and uses advanced indexing techniques to achieve lightning-fast query performance on time-series data. This stack combines Druid's distributed architecture with PostgreSQL as the metadata store and ZooKeeper for service coordination. PostgreSQL maintains Druid's cluster metadata, segment information, and configuration data with ACID compliance, while ZooKeeper handles service discovery, leader election, and coordination between Druid's various node types. The configuration deploys three essential Druid services: the Coordinator manages data availability and balancing, the Broker handles query routing and merging results, and the Router provides a unified query endpoint for client applications. Data engineers and analysts working with high-volume streaming data should consider this stack when traditional databases cannot meet sub-second query requirements. This combination particularly benefits organizations ingesting millions of events per day from sources like web analytics, IoT sensors, or application logs, where users need to drill down into data with complex filters and aggregations in real-time. The stack scales horizontally and provides the foundation for building responsive analytics applications that can handle both batch and streaming data ingestion patterns.

Key Features

  • Sub-second OLAP queries on billion-row datasets through columnar storage and bitmap indexing
  • Real-time data ingestion from Kafka streams with automatic rollup and pre-aggregation
  • Automatic data tiering between hot and cold storage based on time-based rules
  • Built-in approximation algorithms for fast cardinality estimation and quantile calculations
  • Native JSON support for nested data structures and flexible schema evolution
  • Automatic query parallelization across Historical nodes with intelligent data pruning
  • PostgreSQL-backed metadata storage ensuring ACID compliance for cluster state
  • ZooKeeper coordination enabling automatic failover and service discovery across nodes

Common Use Cases

  • 1Real-time web analytics dashboards showing user behavior and conversion funnels
  • 2IoT sensor data analysis for manufacturing equipment monitoring and predictive maintenance
  • 3Financial trading platforms requiring millisecond query responses on market data
  • 4Gaming analytics tracking player behavior and in-game economy metrics
  • 5Advertising technology platforms analyzing bid requests and campaign performance
  • 6Network monitoring systems processing log data for security and performance insights
  • 7E-commerce recommendation engines analyzing clickstream data for personalization

Prerequisites

  • Minimum 4GB RAM (Druid Broker and Coordinator require 1-2GB each, plus PostgreSQL and ZooKeeper overhead)
  • Available ports 8081 (Coordinator), 8082 (Broker), 8888 (Router), 5432 (PostgreSQL), 2181 (ZooKeeper)
  • Understanding of OLAP concepts, time-series data modeling, and data warehouse architectures
  • Familiarity with JSON-based data ingestion specifications and Druid's segment structure
  • Basic knowledge of JVM tuning for optimal memory allocation across Druid services
  • Experience with PostgreSQL administration for metadata store maintenance and backups

For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

docker-compose.yml

docker-compose.yml
1services:
2 postgres:
3 image: postgres:15-alpine
4 container_name: druid-postgres
5 restart: unless-stopped
6 environment:
7 - POSTGRES_USER=druid
8 - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
9 - POSTGRES_DB=druid
10 volumes:
11 - postgres_data:/var/lib/postgresql/data
12
13 zookeeper:
14 image: zookeeper:3.9
15 container_name: druid-zk
16 restart: unless-stopped
17 volumes:
18 - zk_data:/data
19
20 coordinator:
21 image: apache/druid:latest
22 container_name: druid-coordinator
23 restart: unless-stopped
24 ports:
25 - "${COORDINATOR_PORT:-8081}:8081"
26 environment:
27 - druid_metadata_storage_type=postgresql
28 - druid_zk_service_host=zookeeper
29 command: coordinator
30 depends_on:
31 - postgres
32 - zookeeper
33
34 broker:
35 image: apache/druid:latest
36 container_name: druid-broker
37 restart: unless-stopped
38 ports:
39 - "${BROKER_PORT:-8082}:8082"
40 environment:
41 - druid_zk_service_host=zookeeper
42 command: broker
43
44 router:
45 image: apache/druid:latest
46 container_name: druid-router
47 restart: unless-stopped
48 ports:
49 - "${ROUTER_PORT:-8888}:8888"
50 command: router
51
52volumes:
53 postgres_data:
54 zk_data:

.env Template

.env
1# Apache Druid
2COORDINATOR_PORT=8081
3BROKER_PORT=8082
4ROUTER_PORT=8888
5POSTGRES_PASSWORD=druid_password

Usage Notes

  1. 1Druid Console at http://localhost:8888
  2. 2Coordinator at http://localhost:8081
  3. 3Add historical nodes for production

Individual Services(5 services)

Copy individual services to mix and match with your existing compose files.

postgres
postgres:
  image: postgres:15-alpine
  container_name: druid-postgres
  restart: unless-stopped
  environment:
    - POSTGRES_USER=druid
    - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
    - POSTGRES_DB=druid
  volumes:
    - postgres_data:/var/lib/postgresql/data
zookeeper
zookeeper:
  image: zookeeper:3.9
  container_name: druid-zk
  restart: unless-stopped
  volumes:
    - zk_data:/data
coordinator
coordinator:
  image: apache/druid:latest
  container_name: druid-coordinator
  restart: unless-stopped
  ports:
    - ${COORDINATOR_PORT:-8081}:8081
  environment:
    - druid_metadata_storage_type=postgresql
    - druid_zk_service_host=zookeeper
  command: coordinator
  depends_on:
    - postgres
    - zookeeper
broker
broker:
  image: apache/druid:latest
  container_name: druid-broker
  restart: unless-stopped
  ports:
    - ${BROKER_PORT:-8082}:8082
  environment:
    - druid_zk_service_host=zookeeper
  command: broker
router
router:
  image: apache/druid:latest
  container_name: druid-router
  restart: unless-stopped
  ports:
    - ${ROUTER_PORT:-8888}:8888
  command: router

Quick Start

terminal
1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4 postgres:
5 image: postgres:15-alpine
6 container_name: druid-postgres
7 restart: unless-stopped
8 environment:
9 - POSTGRES_USER=druid
10 - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
11 - POSTGRES_DB=druid
12 volumes:
13 - postgres_data:/var/lib/postgresql/data
14
15 zookeeper:
16 image: zookeeper:3.9
17 container_name: druid-zk
18 restart: unless-stopped
19 volumes:
20 - zk_data:/data
21
22 coordinator:
23 image: apache/druid:latest
24 container_name: druid-coordinator
25 restart: unless-stopped
26 ports:
27 - "${COORDINATOR_PORT:-8081}:8081"
28 environment:
29 - druid_metadata_storage_type=postgresql
30 - druid_zk_service_host=zookeeper
31 command: coordinator
32 depends_on:
33 - postgres
34 - zookeeper
35
36 broker:
37 image: apache/druid:latest
38 container_name: druid-broker
39 restart: unless-stopped
40 ports:
41 - "${BROKER_PORT:-8082}:8082"
42 environment:
43 - druid_zk_service_host=zookeeper
44 command: broker
45
46 router:
47 image: apache/druid:latest
48 container_name: druid-router
49 restart: unless-stopped
50 ports:
51 - "${ROUTER_PORT:-8888}:8888"
52 command: router
53
54volumes:
55 postgres_data:
56 zk_data:
57EOF
58
59# 2. Create the .env file
60cat > .env << 'EOF'
61# Apache Druid
62COORDINATOR_PORT=8081
63BROKER_PORT=8082
64ROUTER_PORT=8888
65POSTGRES_PASSWORD=druid_password
66EOF
67
68# 3. Start the services
69docker compose up -d
70
71# 4. View logs
72docker compose logs -f

One-Liner

Run this command to download and set up the recipe in one step:

terminal
1curl -fsSL https://docker.recipes/api/recipes/druid-analytics-cluster/run | bash

Troubleshooting

  • Coordinator shows 'No segments available' error: Verify Historical nodes are running and connected to ZooKeeper, check segment loading rules in Coordinator console
  • Queries timeout or return incomplete results: Increase Broker query timeout settings and verify all Historical nodes are healthy and segments are distributed
  • PostgreSQL connection failures during startup: Ensure PostgreSQL is fully initialized before Druid services start, add healthcheck dependencies between containers
  • ZooKeeper connection lost errors: Verify ZooKeeper has sufficient memory allocation and stable network connectivity, check ZooKeeper logs for session timeouts
  • High memory usage and OOM crashes: Adjust JVM heap settings using DRUID_XMX and DRUID_XMS environment variables based on available container resources
  • Ingestion tasks fail with metadata errors: Check PostgreSQL disk space and connection pool settings, verify Druid metadata tables are properly initialized

Community Notes

Loading...
Loading notes...

Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

Ad Space