docker.recipes

Apache Flink Streaming Cluster

advanced

Apache Flink for real-time stream processing with job manager and task managers.

Overview

Apache Flink is a distributed stream processing framework designed for high-throughput, low-latency real-time data processing. Originally developed at the Technical University of Berlin and later donated to the Apache Software Foundation, Flink has become a cornerstone technology for organizations requiring millisecond-level processing of continuous data streams. The flink-jobmanager serves as the cluster coordinator, managing job scheduling, checkpointing, and resource allocation, while flink-taskmanager instances handle the actual data processing workload across distributed computing resources. This streaming cluster architecture separates concerns between job orchestration and task execution, enabling horizontal scalability and fault tolerance. The jobmanager maintains the master state, coordinates checkpoints for exactly-once processing guarantees, and manages the directed acyclic graph (DAG) of streaming operations. Multiple taskmanager instances provide the computational muscle, each offering configurable task slots that can execute parallel operations on data streams, automatically handling backpressure and state management. Data engineers and platform teams building real-time analytics pipelines will find this configuration invaluable for processing high-velocity data streams from sources like Apache Kafka, Amazon Kinesis, or custom message brokers. The multi-taskmanager setup provides immediate horizontal scaling capabilities, while the centralized jobmanager ensures consistent job lifecycle management and monitoring through Flink's comprehensive web dashboard and metrics system.

Key Features

  • JobManager cluster coordination with automatic leader election and high availability
  • Configurable task slots per TaskManager for fine-tuned parallelism control
  • Built-in exactly-once processing semantics through distributed checkpointing
  • Event-time processing with watermark generation for out-of-order data handling
  • Integrated backpressure handling across the entire processing pipeline
  • Rich windowing operations including tumbling, sliding, and session windows
  • SQL interface support for stream processing with dynamic table concepts
  • Savepoint mechanism for job versioning and cluster migration

Common Use Cases

  • 1Real-time fraud detection for financial transactions with sub-second alerting
  • 2IoT sensor data aggregation and anomaly detection for manufacturing systems
  • 3Live recommendation engine updates based on user clickstream analysis
  • 4Real-time ETL pipelines for data lake ingestion with schema evolution
  • 5Complex event processing for supply chain monitoring and optimization
  • 6Live dashboard metrics calculation from application logs and events
  • 7Stream-to-stream joins for enriching data with external reference information

Prerequisites

  • Minimum 4GB RAM per TaskManager instance plus 2GB for JobManager
  • Port 8081 available for Flink Web Dashboard access
  • Understanding of stream processing concepts and event-time semantics
  • Familiarity with Flink job submission via JAR files or SQL queries
  • Basic knowledge of parallelism configuration and task slot allocation
  • Experience with checkpoint and savepoint management for production deployments

For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

docker-compose.yml

docker-compose.yml
1services:
2 jobmanager:
3 image: flink:latest
4 container_name: flink-jobmanager
5 restart: unless-stopped
6 ports:
7 - "${FLINK_WEBUI_PORT:-8081}:8081"
8 command: jobmanager
9 environment:
10 - |
11 FLINK_PROPERTIES=
12 jobmanager.rpc.address: jobmanager
13 parallelism.default: 2
14 volumes:
15 - flink_data:/opt/flink/data
16 networks:
17 - flink-network
18
19 taskmanager-1:
20 image: flink:latest
21 container_name: flink-taskmanager-1
22 restart: unless-stopped
23 depends_on:
24 - jobmanager
25 command: taskmanager
26 environment:
27 - |
28 FLINK_PROPERTIES=
29 jobmanager.rpc.address: jobmanager
30 taskmanager.numberOfTaskSlots: ${TASK_SLOTS:-4}
31 networks:
32 - flink-network
33
34 taskmanager-2:
35 image: flink:latest
36 container_name: flink-taskmanager-2
37 restart: unless-stopped
38 depends_on:
39 - jobmanager
40 command: taskmanager
41 environment:
42 - |
43 FLINK_PROPERTIES=
44 jobmanager.rpc.address: jobmanager
45 taskmanager.numberOfTaskSlots: ${TASK_SLOTS:-4}
46 networks:
47 - flink-network
48
49volumes:
50 flink_data:
51
52networks:
53 flink-network:
54 driver: bridge

.env Template

.env
1# Apache Flink
2FLINK_WEBUI_PORT=8081
3TASK_SLOTS=4

Usage Notes

  1. 1Flink Dashboard at http://localhost:8081
  2. 2Submit jobs via web UI or CLI
  3. 3Scale task managers as needed
  4. 4Great for real-time analytics

Individual Services(3 services)

Copy individual services to mix and match with your existing compose files.

jobmanager
jobmanager:
  image: flink:latest
  container_name: flink-jobmanager
  restart: unless-stopped
  ports:
    - ${FLINK_WEBUI_PORT:-8081}:8081
  command: jobmanager
  environment:
    - |
      FLINK_PROPERTIES=
      jobmanager.rpc.address: jobmanager
      parallelism.default: 2
  volumes:
    - flink_data:/opt/flink/data
  networks:
    - flink-network
taskmanager-1
taskmanager-1:
  image: flink:latest
  container_name: flink-taskmanager-1
  restart: unless-stopped
  depends_on:
    - jobmanager
  command: taskmanager
  environment:
    - |
      FLINK_PROPERTIES=
      jobmanager.rpc.address: jobmanager
      taskmanager.numberOfTaskSlots: ${TASK_SLOTS:-4}
  networks:
    - flink-network
taskmanager-2
taskmanager-2:
  image: flink:latest
  container_name: flink-taskmanager-2
  restart: unless-stopped
  depends_on:
    - jobmanager
  command: taskmanager
  environment:
    - |
      FLINK_PROPERTIES=
      jobmanager.rpc.address: jobmanager
      taskmanager.numberOfTaskSlots: ${TASK_SLOTS:-4}
  networks:
    - flink-network

Quick Start

terminal
1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4 jobmanager:
5 image: flink:latest
6 container_name: flink-jobmanager
7 restart: unless-stopped
8 ports:
9 - "${FLINK_WEBUI_PORT:-8081}:8081"
10 command: jobmanager
11 environment:
12 - |
13 FLINK_PROPERTIES=
14 jobmanager.rpc.address: jobmanager
15 parallelism.default: 2
16 volumes:
17 - flink_data:/opt/flink/data
18 networks:
19 - flink-network
20
21 taskmanager-1:
22 image: flink:latest
23 container_name: flink-taskmanager-1
24 restart: unless-stopped
25 depends_on:
26 - jobmanager
27 command: taskmanager
28 environment:
29 - |
30 FLINK_PROPERTIES=
31 jobmanager.rpc.address: jobmanager
32 taskmanager.numberOfTaskSlots: ${TASK_SLOTS:-4}
33 networks:
34 - flink-network
35
36 taskmanager-2:
37 image: flink:latest
38 container_name: flink-taskmanager-2
39 restart: unless-stopped
40 depends_on:
41 - jobmanager
42 command: taskmanager
43 environment:
44 - |
45 FLINK_PROPERTIES=
46 jobmanager.rpc.address: jobmanager
47 taskmanager.numberOfTaskSlots: ${TASK_SLOTS:-4}
48 networks:
49 - flink-network
50
51volumes:
52 flink_data:
53
54networks:
55 flink-network:
56 driver: bridge
57EOF
58
59# 2. Create the .env file
60cat > .env << 'EOF'
61# Apache Flink
62FLINK_WEBUI_PORT=8081
63TASK_SLOTS=4
64EOF
65
66# 3. Start the services
67docker compose up -d
68
69# 4. View logs
70docker compose logs -f

One-Liner

Run this command to download and set up the recipe in one step:

terminal
1curl -fsSL https://docker.recipes/api/recipes/apache-flink-cluster/run | bash

Troubleshooting

  • TaskManager not connecting to JobManager: Verify network connectivity and ensure jobmanager.rpc.address matches the service name
  • OutOfMemoryError in TaskManager: Increase container memory limits or reduce taskmanager.memory.process.size in configuration
  • Jobs failing with checkpoint timeout: Increase execution.checkpointing.timeout and check storage backend performance
  • High backpressure warnings: Scale up TaskManager instances or optimize downstream operators and sinks
  • Web UI shows 'No TaskManagers available': Check TaskManager logs for startup errors and verify FLINK_PROPERTIES formatting
  • Job deployment fails with ClassNotFoundException: Ensure all required dependencies are packaged in the job JAR file

Community Notes

Loading...
Loading notes...

Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

Ad Space