Apache Druid Analytics

advanced

Druid for real-time analytics with ZooKeeper.

[i]Overview

Apache Druid is a high-performance, column-oriented, distributed data store designed for fast slice-and-dice analytics on large datasets. Originally developed at Metamarkets and later open-sourced, Druid excels at ingesting streaming data and providing sub-second query responses for OLAP workloads, making it ideal for real-time dashboards, user-facing analytics, and interactive data exploration. Unlike traditional databases, Druid pre-aggregates data at ingestion time and uses advanced indexing techniques to achieve lightning-fast query performance on time-series data. This stack combines Druid's distributed architecture with PostgreSQL as the metadata store and ZooKeeper for service coordination. PostgreSQL maintains Druid's cluster metadata, segment information, and configuration data with ACID compliance, while ZooKeeper handles service discovery, leader election, and coordination between Druid's various node types. The configuration deploys three essential Druid services: the Coordinator manages data availability and balancing, the Broker handles query routing and merging results, and the Router provides a unified query endpoint for client applications. Data engineers and analysts working with high-volume streaming data should consider this stack when traditional databases cannot meet sub-second query requirements. This combination particularly benefits organizations ingesting millions of events per day from sources like web analytics, IoT sensors, or application logs, where users need to drill down into data with complex filters and aggregations in real-time. The stack scales horizontally and provides the foundation for building responsive analytics applications that can handle both batch and streaming data ingestion patterns.

[*]Key Features

[+]Sub-second OLAP queries on billion-row datasets through columnar storage and bitmap indexing
[+]Real-time data ingestion from Kafka streams with automatic rollup and pre-aggregation
[+]Automatic data tiering between hot and cold storage based on time-based rules
[+]Built-in approximation algorithms for fast cardinality estimation and quantile calculations
[+]Native JSON support for nested data structures and flexible schema evolution
[+]Automatic query parallelization across Historical nodes with intelligent data pruning
[+]PostgreSQL-backed metadata storage ensuring ACID compliance for cluster state
[+]ZooKeeper coordination enabling automatic failover and service discovery across nodes

[#]Common Use Cases

[1]Real-time web analytics dashboards showing user behavior and conversion funnels
[2]IoT sensor data analysis for manufacturing equipment monitoring and predictive maintenance
[3]Financial trading platforms requiring millisecond query responses on market data
[4]Gaming analytics tracking player behavior and in-game economy metrics
[5]Advertising technology platforms analyzing bid requests and campaign performance
[6]Network monitoring systems processing log data for security and performance insights
[7]E-commerce recommendation engines analyzing clickstream data for personalization

[!]Prerequisites

[!]Minimum 4GB RAM (Druid Broker and Coordinator require 1-2GB each, plus PostgreSQL and ZooKeeper overhead)
[!]Available ports 8081 (Coordinator), 8082 (Broker), 8888 (Router), 5432 (PostgreSQL), 2181 (ZooKeeper)
[!]Understanding of OLAP concepts, time-series data modeling, and data warehouse architectures
[!]Familiarity with JSON-based data ingestion specifications and Druid's segment structure
[!]Basic knowledge of JVM tuning for optimal memory allocation across Druid services
[!]Experience with PostgreSQL administration for metadata store maintenance and backups

[!]

WARNING: For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

[$]docker-compose.yml

[docker-compose.yml]

1services: 
2  postgres: 
3    image: postgres:15-alpine
4    container_name: druid-postgres
5    restart: unless-stopped
6    environment: 
7      - POSTGRES_USER=druid
8      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
9      - POSTGRES_DB=druid
10    volumes: 
11      - postgres_data:/var/lib/postgresql/data
12
13  zookeeper: 
14    image: zookeeper:3.9
15    container_name: druid-zk
16    restart: unless-stopped
17    volumes: 
18      - zk_data:/data
19
20  coordinator: 
21    image: apache/druid:latest
22    container_name: druid-coordinator
23    restart: unless-stopped
24    ports: 
25      - "${COORDINATOR_PORT:-8081}:8081"
26    environment: 
27      - druid_metadata_storage_type=postgresql
28      - druid_zk_service_host=zookeeper
29    command: coordinator
30    depends_on: 
31      - postgres
32      - zookeeper
33
34  broker: 
35    image: apache/druid:latest
36    container_name: druid-broker
37    restart: unless-stopped
38    ports: 
39      - "${BROKER_PORT:-8082}:8082"
40    environment: 
41      - druid_zk_service_host=zookeeper
42    command: broker
43
44  router: 
45    image: apache/druid:latest
46    container_name: druid-router
47    restart: unless-stopped
48    ports: 
49      - "${ROUTER_PORT:-8888}:8888"
50    command: router
51
52volumes: 
53  postgres_data: 
54  zk_data:

[$].env Template

[.env]

1# Apache Druid
2COORDINATOR_PORT=8081
3BROKER_PORT=8082
4ROUTER_PORT=8888
5POSTGRES_PASSWORD=druid_password

[i]Usage Notes

[1]Druid Console at http://localhost:8888
[2]Coordinator at http://localhost:8081
[3]Add historical nodes for production

Individual Services(5 services)

Copy individual services to mix and match with your existing compose files.

postgres

postgres:
  image: postgres:15-alpine
  container_name: druid-postgres
  restart: unless-stopped
  environment:
    - POSTGRES_USER=druid
    - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
    - POSTGRES_DB=druid
  volumes:
    - postgres_data:/var/lib/postgresql/data

zookeeper

zookeeper:
  image: zookeeper:3.9
  container_name: druid-zk
  restart: unless-stopped
  volumes:
    - zk_data:/data

coordinator

coordinator:
  image: apache/druid:latest
  container_name: druid-coordinator
  restart: unless-stopped
  ports:
    - ${COORDINATOR_PORT:-8081}:8081
  environment:
    - druid_metadata_storage_type=postgresql
    - druid_zk_service_host=zookeeper
  command: coordinator
  depends_on:
    - postgres
    - zookeeper

broker

broker:
  image: apache/druid:latest
  container_name: druid-broker
  restart: unless-stopped
  ports:
    - ${BROKER_PORT:-8082}:8082
  environment:
    - druid_zk_service_host=zookeeper
  command: broker

router

router:
  image: apache/druid:latest
  container_name: druid-router
  restart: unless-stopped
  ports:
    - ${ROUTER_PORT:-8888}:8888
  command: router

[>]Quick Start

[terminal]

1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4  postgres:
5    image: postgres:15-alpine
6    container_name: druid-postgres
7    restart: unless-stopped
8    environment:
9      - POSTGRES_USER=druid
10      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
11      - POSTGRES_DB=druid
12    volumes:
13      - postgres_data:/var/lib/postgresql/data
14
15  zookeeper:
16    image: zookeeper:3.9
17    container_name: druid-zk
18    restart: unless-stopped
19    volumes:
20      - zk_data:/data
21
22  coordinator:
23    image: apache/druid:latest
24    container_name: druid-coordinator
25    restart: unless-stopped
26    ports:
27      - "${COORDINATOR_PORT:-8081}:8081"
28    environment:
29      - druid_metadata_storage_type=postgresql
30      - druid_zk_service_host=zookeeper
31    command: coordinator
32    depends_on:
33      - postgres
34      - zookeeper
35
36  broker:
37    image: apache/druid:latest
38    container_name: druid-broker
39    restart: unless-stopped
40    ports:
41      - "${BROKER_PORT:-8082}:8082"
42    environment:
43      - druid_zk_service_host=zookeeper
44    command: broker
45
46  router:
47    image: apache/druid:latest
48    container_name: druid-router
49    restart: unless-stopped
50    ports:
51      - "${ROUTER_PORT:-8888}:8888"
52    command: router
53
54volumes:
55  postgres_data:
56  zk_data:
57EOF
58
59# 2. Create the .env file
60cat > .env << 'EOF'
61# Apache Druid
62COORDINATOR_PORT=8081
63BROKER_PORT=8082
64ROUTER_PORT=8888
65POSTGRES_PASSWORD=druid_password
66EOF
67
68# 3. Start the services
69docker compose up -d
70
71# 4. View logs
72docker compose logs -f

[>]One-Liner

Run this command to download and set up the recipe in one step:

[terminal]

1curl -fsSL https://docker.recipes/api/recipes/druid-analytics-cluster/run | bash

[?]Troubleshooting

[!]Coordinator shows 'No segments available' error: Verify Historical nodes are running and connected to ZooKeeper, check segment loading rules in Coordinator console
[!]Queries timeout or return incomplete results: Increase Broker query timeout settings and verify all Historical nodes are healthy and segments are distributed
[!]PostgreSQL connection failures during startup: Ensure PostgreSQL is fully initialized before Druid services start, add healthcheck dependencies between containers
[!]ZooKeeper connection lost errors: Verify ZooKeeper has sufficient memory allocation and stable network connectivity, check ZooKeeper logs for session timeouts
[!]High memory usage and OOM crashes: Adjust JVM heap settings using DRUID_XMX and DRUID_XMS environment variables based on available container resources
[!]Ingestion tasks fail with metadata errors: Check PostgreSQL disk space and connection pool settings, verify Druid metadata tables are properly initialized

Community Notes

Loading notes...

## Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

## Components

druidzookeeperpostgresql

## Tags

#druid#analytics#olap#real-time

## Category

Database Stacks

## Related

Shortcuts: C CopyF FavoriteD Download