Apache Flink Streaming Cluster
Apache Flink for real-time stream processing with job manager and task managers.
Overview
Apache Flink is a distributed stream processing framework designed for high-throughput, low-latency real-time data processing. Originally developed at the Technical University of Berlin and later donated to the Apache Software Foundation, Flink has become a cornerstone technology for organizations requiring millisecond-level processing of continuous data streams. The flink-jobmanager serves as the cluster coordinator, managing job scheduling, checkpointing, and resource allocation, while flink-taskmanager instances handle the actual data processing workload across distributed computing resources.
This streaming cluster architecture separates concerns between job orchestration and task execution, enabling horizontal scalability and fault tolerance. The jobmanager maintains the master state, coordinates checkpoints for exactly-once processing guarantees, and manages the directed acyclic graph (DAG) of streaming operations. Multiple taskmanager instances provide the computational muscle, each offering configurable task slots that can execute parallel operations on data streams, automatically handling backpressure and state management.
Data engineers and platform teams building real-time analytics pipelines will find this configuration invaluable for processing high-velocity data streams from sources like Apache Kafka, Amazon Kinesis, or custom message brokers. The multi-taskmanager setup provides immediate horizontal scaling capabilities, while the centralized jobmanager ensures consistent job lifecycle management and monitoring through Flink's comprehensive web dashboard and metrics system.
Key Features
- JobManager cluster coordination with automatic leader election and high availability
- Configurable task slots per TaskManager for fine-tuned parallelism control
- Built-in exactly-once processing semantics through distributed checkpointing
- Event-time processing with watermark generation for out-of-order data handling
- Integrated backpressure handling across the entire processing pipeline
- Rich windowing operations including tumbling, sliding, and session windows
- SQL interface support for stream processing with dynamic table concepts
- Savepoint mechanism for job versioning and cluster migration
Common Use Cases
- 1Real-time fraud detection for financial transactions with sub-second alerting
- 2IoT sensor data aggregation and anomaly detection for manufacturing systems
- 3Live recommendation engine updates based on user clickstream analysis
- 4Real-time ETL pipelines for data lake ingestion with schema evolution
- 5Complex event processing for supply chain monitoring and optimization
- 6Live dashboard metrics calculation from application logs and events
- 7Stream-to-stream joins for enriching data with external reference information
Prerequisites
- Minimum 4GB RAM per TaskManager instance plus 2GB for JobManager
- Port 8081 available for Flink Web Dashboard access
- Understanding of stream processing concepts and event-time semantics
- Familiarity with Flink job submission via JAR files or SQL queries
- Basic knowledge of parallelism configuration and task slot allocation
- Experience with checkpoint and savepoint management for production deployments
For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms
docker-compose.yml
docker-compose.yml
1services: 2 jobmanager: 3 image: flink:latest4 container_name: flink-jobmanager5 restart: unless-stopped6 ports: 7 - "${FLINK_WEBUI_PORT:-8081}:8081"8 command: jobmanager9 environment: 10 - |11 FLINK_PROPERTIES=12 jobmanager.rpc.address: jobmanager13 parallelism.default: 214 volumes: 15 - flink_data:/opt/flink/data16 networks: 17 - flink-network1819 taskmanager-1: 20 image: flink:latest21 container_name: flink-taskmanager-122 restart: unless-stopped23 depends_on: 24 - jobmanager25 command: taskmanager26 environment: 27 - |28 FLINK_PROPERTIES=29 jobmanager.rpc.address: jobmanager30 taskmanager.numberOfTaskSlots: ${TASK_SLOTS:-4}31 networks: 32 - flink-network3334 taskmanager-2: 35 image: flink:latest36 container_name: flink-taskmanager-237 restart: unless-stopped38 depends_on: 39 - jobmanager40 command: taskmanager41 environment: 42 - |43 FLINK_PROPERTIES=44 jobmanager.rpc.address: jobmanager45 taskmanager.numberOfTaskSlots: ${TASK_SLOTS:-4}46 networks: 47 - flink-network4849volumes: 50 flink_data: 5152networks: 53 flink-network: 54 driver: bridge.env Template
.env
1# Apache Flink2FLINK_WEBUI_PORT=80813TASK_SLOTS=4Usage Notes
- 1Flink Dashboard at http://localhost:8081
- 2Submit jobs via web UI or CLI
- 3Scale task managers as needed
- 4Great for real-time analytics
Individual Services(3 services)
Copy individual services to mix and match with your existing compose files.
jobmanager
jobmanager:
image: flink:latest
container_name: flink-jobmanager
restart: unless-stopped
ports:
- ${FLINK_WEBUI_PORT:-8081}:8081
command: jobmanager
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
parallelism.default: 2
volumes:
- flink_data:/opt/flink/data
networks:
- flink-network
taskmanager-1
taskmanager-1:
image: flink:latest
container_name: flink-taskmanager-1
restart: unless-stopped
depends_on:
- jobmanager
command: taskmanager
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
taskmanager.numberOfTaskSlots: ${TASK_SLOTS:-4}
networks:
- flink-network
taskmanager-2
taskmanager-2:
image: flink:latest
container_name: flink-taskmanager-2
restart: unless-stopped
depends_on:
- jobmanager
command: taskmanager
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
taskmanager.numberOfTaskSlots: ${TASK_SLOTS:-4}
networks:
- flink-network
Quick Start
terminal
1# 1. Create the compose file2cat > docker-compose.yml << 'EOF'3services:4 jobmanager:5 image: flink:latest6 container_name: flink-jobmanager7 restart: unless-stopped8 ports:9 - "${FLINK_WEBUI_PORT:-8081}:8081"10 command: jobmanager11 environment:12 - |13 FLINK_PROPERTIES=14 jobmanager.rpc.address: jobmanager15 parallelism.default: 216 volumes:17 - flink_data:/opt/flink/data18 networks:19 - flink-network2021 taskmanager-1:22 image: flink:latest23 container_name: flink-taskmanager-124 restart: unless-stopped25 depends_on:26 - jobmanager27 command: taskmanager28 environment:29 - |30 FLINK_PROPERTIES=31 jobmanager.rpc.address: jobmanager32 taskmanager.numberOfTaskSlots: ${TASK_SLOTS:-4}33 networks:34 - flink-network3536 taskmanager-2:37 image: flink:latest38 container_name: flink-taskmanager-239 restart: unless-stopped40 depends_on:41 - jobmanager42 command: taskmanager43 environment:44 - |45 FLINK_PROPERTIES=46 jobmanager.rpc.address: jobmanager47 taskmanager.numberOfTaskSlots: ${TASK_SLOTS:-4}48 networks:49 - flink-network5051volumes:52 flink_data:5354networks:55 flink-network:56 driver: bridge57EOF5859# 2. Create the .env file60cat > .env << 'EOF'61# Apache Flink62FLINK_WEBUI_PORT=808163TASK_SLOTS=464EOF6566# 3. Start the services67docker compose up -d6869# 4. View logs70docker compose logs -fOne-Liner
Run this command to download and set up the recipe in one step:
terminal
1curl -fsSL https://docker.recipes/api/recipes/apache-flink-cluster/run | bashTroubleshooting
- TaskManager not connecting to JobManager: Verify network connectivity and ensure jobmanager.rpc.address matches the service name
- OutOfMemoryError in TaskManager: Increase container memory limits or reduce taskmanager.memory.process.size in configuration
- Jobs failing with checkpoint timeout: Increase execution.checkpointing.timeout and check storage backend performance
- High backpressure warnings: Scale up TaskManager instances or optimize downstream operators and sinks
- Web UI shows 'No TaskManagers available': Check TaskManager logs for startup errors and verify FLINK_PROPERTIES formatting
- Job deployment fails with ClassNotFoundException: Ensure all required dependencies are packaged in the job JAR file
Community Notes
Loading...
Loading notes...
Download Recipe Kit
Get all files in a ready-to-deploy package
Includes docker-compose.yml, .env template, README, and license
Components
flink-jobmanagerflink-taskmanager
Tags
#flink#streaming#real-time#big-data#analytics
Category
Database StacksAd Space
Shortcuts: C CopyF FavoriteD Download