docker.recipes

Apache Flink

advanced

Stream processing framework for real-time analytics.

Overview

Apache Flink is a distributed stream processing framework designed for high-throughput, low-latency real-time data analytics. Originally developed at the Technical University of Berlin and later donated to the Apache Software Foundation, Flink provides stateful computations over unbounded and bounded data streams with exactly-once processing guarantees. Unlike batch processing systems adapted for streaming, Flink was built from the ground up as a true streaming engine that treats batch as a special case of streaming, enabling complex event processing, real-time machine learning, and continuous analytics at scale. This configuration deploys a complete Flink cluster with separate JobManager and TaskManager components running in coordinated containers. The JobManager serves as the cluster coordinator, handling job scheduling, checkpointing, and recovery, while multiple TaskManager instances execute the actual data processing tasks with configurable parallelism. The setup enables distributed processing across multiple task slots, allowing complex streaming applications to scale horizontally while maintaining Flink's low-latency processing characteristics and fault-tolerance guarantees. Data engineers building real-time analytics pipelines, ML engineers implementing streaming machine learning models, and organizations requiring sub-second processing of high-volume event streams will find this stack invaluable. The combination of Flink's advanced windowing capabilities, exactly-once state consistency, and event-time processing makes it particularly suitable for financial trading systems, IoT sensor networks, and operational monitoring platforms where data freshness and processing accuracy are critical business requirements.

Key Features

  • True streaming engine with event-time processing and advanced windowing semantics
  • Exactly-once state consistency through distributed snapshots and checkpointing
  • JobManager cluster coordination with automatic failover and job recovery
  • Multi-slot TaskManager deployment supporting configurable parallelism levels
  • DataStream API, Table API, and SQL interface for diverse development approaches
  • Built-in backpressure handling and dynamic load balancing across TaskManagers
  • Savepoint mechanism for application versioning and zero-downtime upgrades
  • Web dashboard for real-time job monitoring, metrics visualization, and checkpoint inspection

Common Use Cases

  • 1Real-time fraud detection systems processing credit card transactions with sub-second latency
  • 2IoT sensor data aggregation and anomaly detection for manufacturing equipment monitoring
  • 3Financial market data processing for algorithmic trading and risk management systems
  • 4Live recommendation engines updating user profiles based on streaming behavioral data
  • 5Operational monitoring dashboards aggregating logs and metrics from distributed systems
  • 6Real-time ETL pipelines transforming and enriching streaming data before warehouse loading
  • 7Complex event processing for supply chain optimization and logistics tracking

Prerequisites

  • Minimum 4GB RAM allocated to Docker (2GB for JobManager, 1GB per TaskManager)
  • Port 8081 available for Flink Web UI and job submission interface
  • Understanding of stream processing concepts and Flink job lifecycle management
  • Java development environment if building custom Flink applications locally
  • Knowledge of checkpoint storage requirements for production fault tolerance
  • Familiarity with Flink's event-time vs processing-time semantics for windowing operations

For development & testing. Review security settings, change default credentials, and test thoroughly before production use. See Terms

docker-compose.yml

docker-compose.yml
1services:
2 flink-jobmanager:
3 image: flink:latest
4 container_name: flink-jobmanager
5 command: jobmanager
6 environment:
7 FLINK_PROPERTIES: |
8 jobmanager.rpc.address: flink-jobmanager
9 ports:
10 - "8081:8081"
11 networks:
12 - flink
13
14 flink-taskmanager:
15 image: flink:latest
16 command: taskmanager
17 environment:
18 FLINK_PROPERTIES: |
19 jobmanager.rpc.address: flink-jobmanager
20 taskmanager.numberOfTaskSlots: 2
21 deploy:
22 replicas: 2
23 depends_on:
24 - flink-jobmanager
25 networks:
26 - flink
27
28networks:
29 flink:
30 driver: bridge

.env Template

.env
1# Configure task slots per worker

Usage Notes

  1. 1Docs: https://nightlies.apache.org/flink/flink-docs-stable/
  2. 2Web UI at http://localhost:8081 - job graph, metrics, checkpoints
  3. 3Submit: ./bin/flink run -m localhost:8081 job.jar
  4. 4Scale taskmanagers via deploy.replicas for more parallelism
  5. 5Exactly-once semantics with checkpointing and savepoints
  6. 6Supports DataStream API, Table API, and SQL for streaming

Individual Services(2 services)

Copy individual services to mix and match with your existing compose files.

flink-jobmanager
flink-jobmanager:
  image: flink:latest
  container_name: flink-jobmanager
  command: jobmanager
  environment:
    FLINK_PROPERTIES: |
      jobmanager.rpc.address: flink-jobmanager
  ports:
    - "8081:8081"
  networks:
    - flink
flink-taskmanager
flink-taskmanager:
  image: flink:latest
  command: taskmanager
  environment:
    FLINK_PROPERTIES: |
      jobmanager.rpc.address: flink-jobmanager
      taskmanager.numberOfTaskSlots: 2
  deploy:
    replicas: 2
  depends_on:
    - flink-jobmanager
  networks:
    - flink

Quick Start

terminal
1# 1. Create the compose file
2cat > docker-compose.yml << 'EOF'
3services:
4 flink-jobmanager:
5 image: flink:latest
6 container_name: flink-jobmanager
7 command: jobmanager
8 environment:
9 FLINK_PROPERTIES: |
10 jobmanager.rpc.address: flink-jobmanager
11 ports:
12 - "8081:8081"
13 networks:
14 - flink
15
16 flink-taskmanager:
17 image: flink:latest
18 command: taskmanager
19 environment:
20 FLINK_PROPERTIES: |
21 jobmanager.rpc.address: flink-jobmanager
22 taskmanager.numberOfTaskSlots: 2
23 deploy:
24 replicas: 2
25 depends_on:
26 - flink-jobmanager
27 networks:
28 - flink
29
30networks:
31 flink:
32 driver: bridge
33EOF
34
35# 2. Create the .env file
36cat > .env << 'EOF'
37# Configure task slots per worker
38EOF
39
40# 3. Start the services
41docker compose up -d
42
43# 4. View logs
44docker compose logs -f

One-Liner

Run this command to download and set up the recipe in one step:

terminal
1curl -fsSL https://docker.recipes/api/recipes/flink/run | bash

Troubleshooting

  • JobManager unreachable errors: Verify FLINK_PROPERTIES environment variable contains correct jobmanager.rpc.address setting
  • TaskManager registration failures: Ensure TaskManager containers can resolve jobmanager hostname through Docker network
  • Out of memory errors during job execution: Increase taskmanager.memory.process.size or reduce taskmanager.numberOfTaskSlots
  • Checkpoint failures with 'Could not flush and close' errors: Configure persistent checkpoint storage instead of using local filesystem
  • Jobs stuck in RUNNING state without progress: Check for backpressure issues and verify source connector connectivity
  • Web UI showing 'Connection refused' at localhost:8081: Confirm JobManager container started successfully and port mapping is correct

Community Notes

Loading...
Loading notes...

Download Recipe Kit

Get all files in a ready-to-deploy package

Includes docker-compose.yml, .env template, README, and license

Ad Space