Apache Flink
Stream processing framework for real-time analytics.
Overview
Apache Flink is a distributed stream processing framework designed for high-throughput, low-latency real-time data analytics. Originally developed at the Technical University of Berlin and later donated to the Apache Software Foundation, Flink provides stateful computations over unbounded and bounded data streams with exactly-once processing guarantees. Unlike batch processing systems adapted for streaming, Flink was built from the ground up as a true streaming engine that treats batch as a special case of streaming, enabling complex event processing, real-time machine learning, and continuous analytics at scale.
This configuration deploys a complete Flink cluster with separate JobManager and TaskManager components running in coordinated containers. The JobManager serves as the cluster coordinator, handling job scheduling, checkpointing, and recovery, while multiple TaskManager instances execute the actual data processing tasks with configurable parallelism. The setup enables distributed processing across multiple task slots, allowing complex streaming applications to scale horizontally while maintaining Flink's low-latency processing characteristics and fault-tolerance guarantees.
Data engineers building real-time analytics pipelines, ML engineers implementing streaming machine learning models, and organizations requiring sub-second processing of high-volume event streams will find this stack invaluable. The combination of Flink's advanced windowing capabilities, exactly-once state consistency, and event-time processing makes it particularly suitable for financial trading systems, IoT sensor networks, and operational monitoring platforms where data freshness and processing accuracy are critical business requirements.
Key Features
- True streaming engine with event-time processing and advanced windowing semantics
- Exactly-once state consistency through distributed snapshots and checkpointing
- JobManager cluster coordination with automatic failover and job recovery
- Multi-slot TaskManager deployment supporting configurable parallelism levels
- DataStream API, Table API, and SQL interface for diverse development approaches
- Built-in backpressure handling and dynamic load balancing across TaskManagers
- Savepoint mechanism for application versioning and zero-downtime upgrades
- Web dashboard for real-time job monitoring, metrics visualization, and checkpoint inspection
Common Use Cases
- 1Real-time fraud detection systems processing credit card transactions with sub-second latency
- 2IoT sensor data aggregation and anomaly detection for manufacturing equipment monitoring
- 3Financial market data processing for algorithmic trading and risk management systems
- 4Live recommendation engines updating user profiles based on streaming behavioral data
- 5Operational monitoring dashboards aggregating logs and metrics from distributed systems
- 6Real-time ETL pipelines transforming and enriching streaming data before warehouse loading
- 7Complex event processing for supply chain optimization and logistics tracking
Prerequisites
- Minimum 4GB RAM allocated to Docker (2GB for JobManager, 1GB per TaskManager)
- Port 8081 available for Flink Web UI and job submission interface
- Understanding of stream processing concepts and Flink job lifecycle management
- Java development environment if building custom Flink applications locally
- Knowledge of checkpoint storage requirements for production fault tolerance
- Familiarity with Flink's event-time vs processing-time semantics for windowing operations
For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms
docker-compose.yml
docker-compose.yml
1services: 2 flink-jobmanager: 3 image: flink:latest4 container_name: flink-jobmanager5 command: jobmanager6 environment: 7 FLINK_PROPERTIES: |8 jobmanager.rpc.address: flink-jobmanager9 ports: 10 - "8081:8081"11 networks: 12 - flink1314 flink-taskmanager: 15 image: flink:latest16 command: taskmanager17 environment: 18 FLINK_PROPERTIES: |19 jobmanager.rpc.address: flink-jobmanager20 taskmanager.numberOfTaskSlots: 221 deploy: 22 replicas: 223 depends_on: 24 - flink-jobmanager25 networks: 26 - flink2728networks: 29 flink: 30 driver: bridge.env Template
.env
1# Configure task slots per workerUsage Notes
- 1Docs: https://nightlies.apache.org/flink/flink-docs-stable/
- 2Web UI at http://localhost:8081 - job graph, metrics, checkpoints
- 3Submit: ./bin/flink run -m localhost:8081 job.jar
- 4Scale taskmanagers via deploy.replicas for more parallelism
- 5Exactly-once semantics with checkpointing and savepoints
- 6Supports DataStream API, Table API, and SQL for streaming
Individual Services(2 services)
Copy individual services to mix and match with your existing compose files.
flink-jobmanager
flink-jobmanager:
image: flink:latest
container_name: flink-jobmanager
command: jobmanager
environment:
FLINK_PROPERTIES: |
jobmanager.rpc.address: flink-jobmanager
ports:
- "8081:8081"
networks:
- flink
flink-taskmanager
flink-taskmanager:
image: flink:latest
command: taskmanager
environment:
FLINK_PROPERTIES: |
jobmanager.rpc.address: flink-jobmanager
taskmanager.numberOfTaskSlots: 2
deploy:
replicas: 2
depends_on:
- flink-jobmanager
networks:
- flink
Quick Start
terminal
1# 1. Create the compose file2cat > docker-compose.yml << 'EOF'3services:4 flink-jobmanager:5 image: flink:latest6 container_name: flink-jobmanager7 command: jobmanager8 environment:9 FLINK_PROPERTIES: |10 jobmanager.rpc.address: flink-jobmanager11 ports:12 - "8081:8081"13 networks:14 - flink1516 flink-taskmanager:17 image: flink:latest18 command: taskmanager19 environment:20 FLINK_PROPERTIES: |21 jobmanager.rpc.address: flink-jobmanager22 taskmanager.numberOfTaskSlots: 223 deploy:24 replicas: 225 depends_on:26 - flink-jobmanager27 networks:28 - flink2930networks:31 flink:32 driver: bridge33EOF3435# 2. Create the .env file36cat > .env << 'EOF'37# Configure task slots per worker38EOF3940# 3. Start the services41docker compose up -d4243# 4. View logs44docker compose logs -fOne-Liner
Run this command to download and set up the recipe in one step:
terminal
1curl -fsSL https://docker.recipes/api/recipes/flink/run | bashTroubleshooting
- JobManager unreachable errors: Verify FLINK_PROPERTIES environment variable contains correct jobmanager.rpc.address setting
- TaskManager registration failures: Ensure TaskManager containers can resolve jobmanager hostname through Docker network
- Out of memory errors during job execution: Increase taskmanager.memory.process.size or reduce taskmanager.numberOfTaskSlots
- Checkpoint failures with 'Could not flush and close' errors: Configure persistent checkpoint storage instead of using local filesystem
- Jobs stuck in RUNNING state without progress: Check for backpressure issues and verify source connector connectivity
- Web UI showing 'Connection refused' at localhost:8081: Confirm JobManager container started successfully and port mapping is correct
Community Notes
Loading...
Loading notes...
Download Recipe Kit
Get all files in a ready-to-deploy package
Includes docker-compose.yml, .env template, README, and license
Ad Space
Shortcuts: C CopyF FavoriteD Download