Apache Druid Analytics
Druid for real-time analytics with ZooKeeper.
Overview
Apache Druid is a high-performance, column-oriented, distributed data store designed for fast slice-and-dice analytics on large datasets. Originally developed at Metamarkets and later open-sourced, Druid excels at ingesting streaming data and providing sub-second query responses for OLAP workloads, making it ideal for real-time dashboards, user-facing analytics, and interactive data exploration. Unlike traditional databases, Druid pre-aggregates data at ingestion time and uses advanced indexing techniques to achieve lightning-fast query performance on time-series data.
This stack combines Druid's distributed architecture with PostgreSQL as the metadata store and ZooKeeper for service coordination. PostgreSQL maintains Druid's cluster metadata, segment information, and configuration data with ACID compliance, while ZooKeeper handles service discovery, leader election, and coordination between Druid's various node types. The configuration deploys three essential Druid services: the Coordinator manages data availability and balancing, the Broker handles query routing and merging results, and the Router provides a unified query endpoint for client applications.
Data engineers and analysts working with high-volume streaming data should consider this stack when traditional databases cannot meet sub-second query requirements. This combination particularly benefits organizations ingesting millions of events per day from sources like web analytics, IoT sensors, or application logs, where users need to drill down into data with complex filters and aggregations in real-time. The stack scales horizontally and provides the foundation for building responsive analytics applications that can handle both batch and streaming data ingestion patterns.
Key Features
- Sub-second OLAP queries on billion-row datasets through columnar storage and bitmap indexing
- Real-time data ingestion from Kafka streams with automatic rollup and pre-aggregation
- Automatic data tiering between hot and cold storage based on time-based rules
- Built-in approximation algorithms for fast cardinality estimation and quantile calculations
- Native JSON support for nested data structures and flexible schema evolution
- Automatic query parallelization across Historical nodes with intelligent data pruning
- PostgreSQL-backed metadata storage ensuring ACID compliance for cluster state
- ZooKeeper coordination enabling automatic failover and service discovery across nodes
Common Use Cases
- 1Real-time web analytics dashboards showing user behavior and conversion funnels
- 2IoT sensor data analysis for manufacturing equipment monitoring and predictive maintenance
- 3Financial trading platforms requiring millisecond query responses on market data
- 4Gaming analytics tracking player behavior and in-game economy metrics
- 5Advertising technology platforms analyzing bid requests and campaign performance
- 6Network monitoring systems processing log data for security and performance insights
- 7E-commerce recommendation engines analyzing clickstream data for personalization
Prerequisites
- Minimum 4GB RAM (Druid Broker and Coordinator require 1-2GB each, plus PostgreSQL and ZooKeeper overhead)
- Available ports 8081 (Coordinator), 8082 (Broker), 8888 (Router), 5432 (PostgreSQL), 2181 (ZooKeeper)
- Understanding of OLAP concepts, time-series data modeling, and data warehouse architectures
- Familiarity with JSON-based data ingestion specifications and Druid's segment structure
- Basic knowledge of JVM tuning for optimal memory allocation across Druid services
- Experience with PostgreSQL administration for metadata store maintenance and backups
For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms
docker-compose.yml
docker-compose.yml
1services: 2 postgres: 3 image: postgres:15-alpine4 container_name: druid-postgres5 restart: unless-stopped6 environment: 7 - POSTGRES_USER=druid8 - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}9 - POSTGRES_DB=druid10 volumes: 11 - postgres_data:/var/lib/postgresql/data1213 zookeeper: 14 image: zookeeper:3.915 container_name: druid-zk16 restart: unless-stopped17 volumes: 18 - zk_data:/data1920 coordinator: 21 image: apache/druid:latest22 container_name: druid-coordinator23 restart: unless-stopped24 ports: 25 - "${COORDINATOR_PORT:-8081}:8081"26 environment: 27 - druid_metadata_storage_type=postgresql28 - druid_zk_service_host=zookeeper29 command: coordinator30 depends_on: 31 - postgres32 - zookeeper3334 broker: 35 image: apache/druid:latest36 container_name: druid-broker37 restart: unless-stopped38 ports: 39 - "${BROKER_PORT:-8082}:8082"40 environment: 41 - druid_zk_service_host=zookeeper42 command: broker4344 router: 45 image: apache/druid:latest46 container_name: druid-router47 restart: unless-stopped48 ports: 49 - "${ROUTER_PORT:-8888}:8888"50 command: router5152volumes: 53 postgres_data: 54 zk_data: .env Template
.env
1# Apache Druid2COORDINATOR_PORT=80813BROKER_PORT=80824ROUTER_PORT=88885POSTGRES_PASSWORD=druid_passwordUsage Notes
- 1Druid Console at http://localhost:8888
- 2Coordinator at http://localhost:8081
- 3Add historical nodes for production
Individual Services(5 services)
Copy individual services to mix and match with your existing compose files.
postgres
postgres:
image: postgres:15-alpine
container_name: druid-postgres
restart: unless-stopped
environment:
- POSTGRES_USER=druid
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- POSTGRES_DB=druid
volumes:
- postgres_data:/var/lib/postgresql/data
zookeeper
zookeeper:
image: zookeeper:3.9
container_name: druid-zk
restart: unless-stopped
volumes:
- zk_data:/data
coordinator
coordinator:
image: apache/druid:latest
container_name: druid-coordinator
restart: unless-stopped
ports:
- ${COORDINATOR_PORT:-8081}:8081
environment:
- druid_metadata_storage_type=postgresql
- druid_zk_service_host=zookeeper
command: coordinator
depends_on:
- postgres
- zookeeper
broker
broker:
image: apache/druid:latest
container_name: druid-broker
restart: unless-stopped
ports:
- ${BROKER_PORT:-8082}:8082
environment:
- druid_zk_service_host=zookeeper
command: broker
router
router:
image: apache/druid:latest
container_name: druid-router
restart: unless-stopped
ports:
- ${ROUTER_PORT:-8888}:8888
command: router
Quick Start
terminal
1# 1. Create the compose file2cat > docker-compose.yml << 'EOF'3services:4 postgres:5 image: postgres:15-alpine6 container_name: druid-postgres7 restart: unless-stopped8 environment:9 - POSTGRES_USER=druid10 - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}11 - POSTGRES_DB=druid12 volumes:13 - postgres_data:/var/lib/postgresql/data1415 zookeeper:16 image: zookeeper:3.917 container_name: druid-zk18 restart: unless-stopped19 volumes:20 - zk_data:/data2122 coordinator:23 image: apache/druid:latest24 container_name: druid-coordinator25 restart: unless-stopped26 ports:27 - "${COORDINATOR_PORT:-8081}:8081"28 environment:29 - druid_metadata_storage_type=postgresql30 - druid_zk_service_host=zookeeper31 command: coordinator32 depends_on:33 - postgres34 - zookeeper3536 broker:37 image: apache/druid:latest38 container_name: druid-broker39 restart: unless-stopped40 ports:41 - "${BROKER_PORT:-8082}:8082"42 environment:43 - druid_zk_service_host=zookeeper44 command: broker4546 router:47 image: apache/druid:latest48 container_name: druid-router49 restart: unless-stopped50 ports:51 - "${ROUTER_PORT:-8888}:8888"52 command: router5354volumes:55 postgres_data:56 zk_data:57EOF5859# 2. Create the .env file60cat > .env << 'EOF'61# Apache Druid62COORDINATOR_PORT=808163BROKER_PORT=808264ROUTER_PORT=888865POSTGRES_PASSWORD=druid_password66EOF6768# 3. Start the services69docker compose up -d7071# 4. View logs72docker compose logs -fOne-Liner
Run this command to download and set up the recipe in one step:
terminal
1curl -fsSL https://docker.recipes/api/recipes/druid-analytics-cluster/run | bashTroubleshooting
- Coordinator shows 'No segments available' error: Verify Historical nodes are running and connected to ZooKeeper, check segment loading rules in Coordinator console
- Queries timeout or return incomplete results: Increase Broker query timeout settings and verify all Historical nodes are healthy and segments are distributed
- PostgreSQL connection failures during startup: Ensure PostgreSQL is fully initialized before Druid services start, add healthcheck dependencies between containers
- ZooKeeper connection lost errors: Verify ZooKeeper has sufficient memory allocation and stable network connectivity, check ZooKeeper logs for session timeouts
- High memory usage and OOM crashes: Adjust JVM heap settings using DRUID_XMX and DRUID_XMS environment variables based on available container resources
- Ingestion tasks fail with metadata errors: Check PostgreSQL disk space and connection pool settings, verify Druid metadata tables are properly initialized
Community Notes
Loading...
Loading notes...
Download Recipe Kit
Get all files in a ready-to-deploy package
Includes docker-compose.yml, .env template, README, and license
Ad Space
Shortcuts: C CopyF FavoriteD Download