© 2019 Ververica
David Anderson | @alpinegizmo | Training Coordinator
Getting Started with
Apache Flink® on Kubernetes
2 © 2019 Ververica
About Ververica
Creators of
Apache Flink®
Real Time
Stream Processing
for the Enterprise
3 © 2019 Ververica
Outline
1. Introduction
2. Detailed Example
3. Debugging Tips
4. Future Plans
4 © 2019 Ververica
Why Containers?
• Containers provide isolation at low cost
– Require fewer resources than VMs
– Smaller, boot faster
• Simpler to manage
– Each container does one thing
– Consistent packaging
• Enables flexible and dynamic resource allocation
– Scalable
– Composable
5 © 2019 Ververica
Container Orchestration with Kubernetes
• Declarative configuration:
– You tell K8s the desired state, and a background process makes it happen
• 3 replicas of this container should be kept running
• A load balancer should exist, listening on port 443, backed by container with this label
• Core resource types:
– Pod: a group of one or more containers
– Job: keeps pod(s) running until finished
– Deployment: keeps n pods running indefinitely
– Service: a REST object backed by a set of pods
– Persistent Volume Claim: storage whose lifetime is not coupled to any of the pods
6 © 2019 Ververica
Vision: Flink as a Library
• Makes deployments simpler
– Focus is on deploying/running an application
– You build one, complete job-specific Docker image that includes:
• Your application code
• Flink libraries
• Other dependencies
• Configuration files
7 © 2019 Ververica
Flink’s Runtime Building Blocks
• Cluster framework-specific
• Manages available TaskManagers
• Acquires / releases resources
ResourceManager
TaskManagerJobManager
• Registers with ResourceManager
• Provides “task slots”
• Assigned tasks by one or more JobManagers
• One per job
• Schedules job in terms of "task slots"
• Monitors task execution
• Coordinates checkpointing
Dispatcher
• Touch-point for job submissions
• Spawns JobManagers
8 © 2019 Ververica
Flink’s Runtime Building Blocks
• Cluster framework-specific
• Manages available TaskManagers
• Acquires / releases resources
ResourceManager
TaskManagerJobManager
• Registers with ResourceManager
• Provides “task slots”
• Assigned tasks by JobManager(s)
• One per job
• Schedules job in terms of "task slots"
• Monitors task execution
• Coordinates checkpointing
Dispatcher
• Touch-point for job submissions
• Spawns JobManagers
9 © 2019 Ververica
Runtime Building Blocks (on Yarn)
ResourceManager
(3) Request slots
TaskManager
JobManager
(4) Start TaskManager
(5) Register
(7) Deploy Tasks
Dispatcher
App/Client
(1) Submit Job
(2) Start JobManager
(6) Offer slots
10 © 2019 Ververica
But we’re not quite there yet with K8s
11 © 2019 Ververica
Flink on K8s: current status
• Still using the legacy standalone resource manager
• Deployment establishes a static execution environment
• You will have a k8s manifest that effectively says
– there should be n taskmanagers that look like this
Flink is not aware of Kubernetes
12 © 2019 Ververica
Master Container
ResourceManager
JobManager
Mini Dispatcher
(2) Run & Start
Worker Container
TaskManager
Worker Container
TaskManager
Worker Container
TaskManager
(3) Register
(4) Deploy Tasks
(0) One image is built that can be either a Master or Worker
(1) Container framework starts Master & Worker Containers
Flink job cluster on K8s
13 © 2019 Ververica
2. EXAMPLE
https://github.com/alpinegizmo/flink-containers-example
14 © 2019 Ververica
Very Simple Streaming Job
https://github.com/alpinegizmo/flink-containers-example
data generator RichFlatMap print
# events per user
keyBy
15 © 2019 Ververica
16 © 2019 Ververica
Desired Runtime Landscape for K8s
17 © 2019 Ververica
Steps
1. Build the docker image
2. Set up job cluster (k8s job) &
task managers (k8s deployment)
3. Set up job cluster service
4. Add minio for checkpoints
18 © 2019 Ververica
1: Build a docker image
ADD $flink_dist $FLINK_INSTALL_PATH
ADD $job_jar $FLINK_INSTALL_PATH/job.jar
. . .
COPY docker/flink/flink-conf.yaml $FLINK_HOME/conf
COPY docker/flink/log4j-console.properties $FLINK_HOME/conf
COPY docker/flink/docker-entrypoint.sh /
. . .
ENTRYPOINT ["/docker-entrypoint.sh"]
Dockerfile
19 © 2019 Ververica
. . .
JOB_CLUSTER="job-cluster"
TASK_MANAGER="task-manager"
CMD="$1"
shift;
if [ "${CMD}" == "${JOB_CLUSTER}" -o "${CMD}" == "${TASK_MANAGER}" ]; then
if [ "${CMD}" == "${TASK_MANAGER}" ]; then
exec $FLINK_HOME/bin/taskmanager.sh start-foreground "$@"
else
exec $FLINK_HOME/bin/standalone-job.sh start-foreground "$@"
fi
fi
exec "$@"
docker-entrypoint.sh
20 © 2019 Ververica
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: flink-task-manager
spec:
replicas: ${FLINK_NUM_OF_TASKMANAGERS}
template:
metadata:
labels:
app: flink
component: task-manager
spec:
containers:
- name: flink-task-manager
image: ${FLINK_IMAGE_NAME}
imagePullPolicy: Never
args: ["task-manager",
"-Djobmanager.rpc.address=flink-job-cluster"]
task-manager-deployment.yaml.template
apiVersion: batch/v1
kind: Job
metadata:
name: flink-job-cluster
spec:
template:
metadata:
labels:
app: flink
component: job-cluster
spec:
restartPolicy: OnFailure
containers:
- name: flink-job-cluster
image: ${FLINK_IMAGE_NAME}
imagePullPolicy: Never
args: ["job-cluster",
"-Djobmanager.rpc.address=flink-job-cluster",
"-Dblob.server.port=6124",
"-Dqueryable-state.server.ports=6125"]
ports:
- containerPort: 6123
name: rpc
- containerPort: 6124
name: blob
- containerPort: 6125
name: query
- containerPort: 8081
name: ui
job-cluster-job.yaml.template
2: K8s manifests
21 © 2019 Ververica
task-manager-deployment.yaml.template
apiVersion: batch/v1
kind: Job
metadata:
name: flink-job-cluster
spec:
template:
metadata:
labels:
app: flink
component: job-cluster
spec:
restartPolicy: OnFailure
containers:
- name: flink-job-cluster
image: ${FLINK_IMAGE_NAME}
imagePullPolicy: Never
args: ["job-cluster",
"-Djobmanager.rpc.address=flink-job-cluster",
"-Dblob.server.port=6124",
"-Dqueryable-state.server.ports=6125"]
ports:
- containerPort: 6123
name: rpc
- containerPort: 6124
name: blob
- containerPort: 6125
name: query
- containerPort: 8081
name: ui
job-cluster-job.yaml.template
2: K8s manifests
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: flink-task-manager
spec:
replicas: ${FLINK_NUM_OF_TASKMANAGERS}
template:
metadata:
labels:
app: flink
component: task-manager
spec:
containers:
- name: flink-task-manager
image: ${FLINK_IMAGE_NAME}
imagePullPolicy: Never
args: ["task-manager",
"-Djobmanager.rpc.address=flink-job-cluster"]
22 © 2019 Ververica
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: flink-task-manager
spec:
replicas: ${FLINK_NUM_OF_TASKMANAGERS}
template:
metadata:
labels:
app: flink
component: task-manager
spec:
containers:
- name: flink-task-manager
image: ${FLINK_IMAGE_NAME}
imagePullPolicy: Never
args: ["task-manager",
"-Djobmanager.rpc.address=flink-job-cluster"]
task-manager-deployment.yaml.template
apiVersion: batch/v1
kind: Job
metadata:
name: flink-job-cluster
spec:
template:
metadata:
labels:
app: flink
component: job-cluster
spec:
restartPolicy: OnFailure
containers:
- name: flink-job-cluster
image: ${FLINK_IMAGE_NAME}
imagePullPolicy: Never
args: ["job-cluster",
"-Djobmanager.rpc.address=flink-job-cluster",
"-Dblob.server.port=6124",
"-Dqueryable-state.server.ports=6125"]
ports:
- containerPort: 6123
name: rpc
- containerPort: 6124
name: blob
- containerPort: 6125
name: query
- containerPort: 8081
name: ui
job-cluster-job.yaml.template
2: K8s manifests
23 © 2019 Ververica
24 © 2019 Ververica
apiVersion: v1
kind: Service
metadata:
name: flink-job-cluster
labels:
app: flink
component: job-cluster
spec:
ports:
- name: rpc
port: 6123
- name: blob
port: 6124
- name: query
port: 6125
nodePort: 30025
- name: ui
port: 8081
nodePort: 30081
type: NodePort
selector:
app: flink
component: job-cluster
3: Expose job cluster as a service
job-cluster-service.yaml
internal ports
external ports
25 © 2019 Ververica
26 © 2019 Ververica
4: Setup minio for checkpoints & savepoints
• S3-compatible storage service
• Apache License v2.0
• Lightweight, easy to setup
27 © 2019 Ververica
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: minio-pv-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
minio-standalone-pvc.yaml
28 © 2019 Ververica
minio-standalone-deployment.yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: minio
spec:
strategy:
type: Recreate
template:
metadata:
labels:
app: minio
spec:
volumes:
- name: data
persistentVolumeClaim:
claimName: minio-pv-claim
containers:
- name: minio
volumeMounts:
- name: data
mountPath: "/data"
image: minio/minio:RELEASE.2019-03-13T21-59-47Z
args:
- server
- /data
env:
- name: MINIO_ACCESS_KEY
value: "minio"
- name: MINIO_SECRET_KEY
value: "minio123"
ports:
- containerPort: 9000
livenessProbe:
httpGet:
path: /minio/health/live
port: 9000
initialDelaySeconds: 120
periodSeconds: 20
29 © 2019 Ververica
apiVersion: v1
kind: Service
metadata:
name: minio-service
spec:
type: NodePort
ports:
- port: 9000
nodePort: 30090
selector:
app: minio
s3.path-style-access: true
s3.endpoint: http://minio-service:9000
minio-standalone-service.yaml
flink-conf.yaml
30 © 2019 Ververica
/bin/sh -c "
sleep 10;
/usr/bin/mc config host add myminio http://minio-service:9000 minio minio123;
/usr/bin/mc mb myminio/state;
exit 0;
"
minio setup job
state.checkpoints.dir: s3://state/checkpoints
state.savepoints.dir: s3://state/savepoints
s3.access-key: minio
s3.secret-key: minio123
flink-conf.yaml
31 © 2019 Ververica
A Note on Bucket Addresses
• Two ways to specify buckets:
– virtual-hosted style: state.minio-service:9000
– path-style: minio-service:9000/state
• It’s easier to get path-style addresses working, by either using
– s3.path-style-access: true (requires flink 1.8+)
or by
– specifying the endpoint with its IP address, rather than hostname
32 © 2019 Ververica
33 © 2019 Ververica
34 © 2019 Ververica
Rescaling
$ kubectl scale deployment -l component=task-manager --replicas=2
deployment.extensions "flink-task-manager" scaled
$ flink modify 00000000000000000000000000000000 -p 8 -m localhost:30081
Modify job 00000000000000000000000000000000.
Rescaled job 00000000000000000000000000000000. Its new parallelism is 8.
35 © 2019 Ververica
3. DEBUGGING
36 © 2019 Ververica
. . .
JOB_CLUSTER="job-cluster"
TASK_MANAGER="task-manager"
if [ "${CMD}" == "${JOB_CLUSTER}" -o "${CMD}" == "${TASK_MANAGER}" ]; then
echo "Starting the ${CMD}"
echo "config file: " && grep '^[^n#]' $FLINK_HOME/conf/flink-conf.yaml
if [ "${CMD}" == "${TASK_MANAGER}" ]; then
exec $FLINK_HOME/bin/taskmanager.sh start-foreground "$@"
else
exec $FLINK_HOME/bin/standalone-job.sh start-foreground "$@"
fi
fi
exec "$@"
docker-entrypoint.sh
37 © 2019 Ververica
Starting the job-cluster
config file:
jobmanager.rpc.address: localhost
jobmanager.rpc.port: 6123
jobmanager.heap.size: 1024m
taskmanager.heap.size: 1024m
taskmanager.numberOfTaskSlots: 4
parallelism.default: 1
high-availability: zookeeper
high-availability.jobmanager.port: 6123
high-availability.storageDir: s3://highavailability/storage
high-availability.zookeeper.quorum: zoo1:2181
state.backend: filesystem
state.checkpoints.dir: s3://state/checkpoints
state.savepoints.dir: s3://state/savepoints
rest.port: 8081
zookeeper.sasl.disable: true
s3.access-key: minio
s3.secret-key: minio123
s3.path-style-access: true
s3.endpoint: http://minio-service:9000
logs
38 © 2019 Ververica
39 © 2019 Ververica
4. FUTURE PLANS
40 © 2019 Ververica
Tighter Integration with K8s
• Active mode
– Flink is aware of the cluster manager that it is running on,
and interacts with it
– Examples exist, e.g., FLIP-6 YARN
• Reactive mode
– Flink is oblivious to its environment
– Flink may react to resources changes by scaling job
41 © 2019 Ververica
Active k8s Integration
K8s deployment
controller
Client
TaskManager
JobManager
K8sResourceManager
ApplicationMaster
TaskManager
(3) Submit job
(1) Submit AM deployment
(2) Start AM
pod
(4) Start JM
(5) Request slots
(6) Submit TM
deployment
(7) Start TM pod
(8) Register(9) Request slots
(10) Offer slots
42 © 2019 Ververica
FLINK-9953: Active Kubernetes integration
The ResourceManager can talk to Kubernetes to launch new pods
43 © 2019 Ververica
Reactive Container Mode
• Relies on external system to start/release
TaskManagers, e.g.,
– Kubernetes Horizontal Pod Autoscaler
– GCP Autoscaling
– AWS Auto Scaling Group
• Re-scale job as resources are
added/removed (take savepoint and resume
job with new parallelism automatically)
• By definition works with all cluster managers
Flink cluster
JM TM TM
ASG
Start new TM if
CPU% > threshold
Monitor metrics, e.g, CPU%
Register
& offer
slots
Event rate over time
44 © 2019 Ververica
FLINK-10407: Reactive container mode
Re-scale job as resources are added/removed
45 © 2019 Ververica
Summary
• Flink currently supports job and session clusters on K8s
• Example
– https://github.com/alpinegizmo/flink-containers-example
• Active K8s integration is in progress
• Reactive container mode has been designed/planned
• Call to action:
– Umbrella tickets: FLINK-9953, FLINK-10407
– Join discussions on dev@flink.apache.org
46 © 2019 Ververica
Thank you!
47 © 2019 Ververica
Questions?
48 © 2019 Ververica
www.ververica.com @VervericaDatadavid@ververica.com

More Related Content

PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
PDF
Introducing the Apache Flink Kubernetes Operator
PPTX
Autoscaling Flink with Reactive Mode
PPTX
Practical learnings from running thousands of Flink jobs
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PDF
Apache Iceberg: An Architectural Look Under the Covers
PPTX
Introduction to Apache Kafka
PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Introducing the Apache Flink Kubernetes Operator
Autoscaling Flink with Reactive Mode
Practical learnings from running thousands of Flink jobs
Where is my bottleneck? Performance troubleshooting in Flink
Apache Iceberg: An Architectural Look Under the Covers
Introduction to Apache Kafka
Building a fully managed stream processing platform on Flink at scale for Lin...

What's hot (20)

PPTX
Evening out the uneven: dealing with skew in Flink
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PDF
Iceberg: A modern table format for big data (Strata NY 2018)
PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
PPTX
Apache Arrow: In Theory, In Practice
PPTX
PPTX
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
PPTX
The top 3 challenges running multi-tenant Flink at scale
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PDF
Parquet performance tuning: the missing guide
PDF
Apache Iceberg - A Table Format for Hige Analytic Datasets
PPTX
Apache Kafka Best Practices
PDF
Apache Nifi Crash Course
PPTX
Splunk overview
PDF
Hardening Kafka Replication
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
PDF
The Parquet Format and Performance Optimization Opportunities
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
PPTX
Apache Kudu: Technical Deep Dive


PDF
An Introduction to Apache Kafka
Evening out the uneven: dealing with skew in Flink
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Iceberg: A modern table format for big data (Strata NY 2018)
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Apache Arrow: In Theory, In Practice
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
The top 3 challenges running multi-tenant Flink at scale
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Parquet performance tuning: the missing guide
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Kafka Best Practices
Apache Nifi Crash Course
Splunk overview
Hardening Kafka Replication
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
The Parquet Format and Performance Optimization Opportunities
Tuning Apache Kafka Connectors for Flink.pptx
Apache Kudu: Technical Deep Dive


An Introduction to Apache Kafka
Ad

Similar to Deploying Flink on Kubernetes - David Anderson (20)

PDF
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
PDF
Flink Forward San Francisco 2019: Future of Apache Flink Deployments: Contain...
PDF
Dockerizing OpenStack for High Availability
PDF
Scaling docker with kubernetes
PPTX
Kubernetes for the VI Admin
PPTX
Photon Controller: An Open Source Container Infrastructure Platform from VMware
PPTX
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
PDF
DockerCon 2022 - From legacy to Kubernetes, securely & quickly
PDF
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
PDF
The Kubernetes WebLogic revival (part 2)
PDF
Docker kubernetes fundamental(pod_service)_190307
PPTX
K8s in 3h - Kubernetes Fundamentals Training
PPTX
20191201 kubernetes managed weblogic revival - part 2
PPTX
Kubernetes workshop -_the_basics
PDF
VMware Tanzu Introduction- June 11, 2020
PDF
Get you Java application ready for Kubernetes !
PDF
DevNetCreate - ACI and Kubernetes Integration
PDF
Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)
PDF
Deploying Kubernetes in the Enterprise (IBM #Think2019 #7678 Tech Talk)
PDF
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Flink Forward San Francisco 2019: Future of Apache Flink Deployments: Contain...
Dockerizing OpenStack for High Availability
Scaling docker with kubernetes
Kubernetes for the VI Admin
Photon Controller: An Open Source Container Infrastructure Platform from VMware
01 - VMUGIT - Lecce 2018 - Fabio Rapposelli, VMware
DockerCon 2022 - From legacy to Kubernetes, securely & quickly
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
The Kubernetes WebLogic revival (part 2)
Docker kubernetes fundamental(pod_service)_190307
K8s in 3h - Kubernetes Fundamentals Training
20191201 kubernetes managed weblogic revival - part 2
Kubernetes workshop -_the_basics
VMware Tanzu Introduction- June 11, 2020
Get you Java application ready for Kubernetes !
DevNetCreate - ACI and Kubernetes Integration
Part 4: Custom Buildpacks and Data Services (Pivotal Cloud Platform Roadshow)
Deploying Kubernetes in the Enterprise (IBM #Think2019 #7678 Tech Talk)
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
Ad

More from Ververica (20)

PDF
2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...
PDF
Webinar: How to contribute to Apache Flink - Robert Metzger
PDF
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
PDF
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
PDF
Webinar: Detecting row patterns with Flink SQL - Dawid Wysakowicz
PPTX
Webinar: Flink SQL in Action - Fabian Hueske
PPTX
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
PDF
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
PPTX
Stephan Ewen - Experiences running Flink at Very Large Scale
PPTX
Fabian Hueske - Stream Analytics with SQL on Apache Flink
PDF
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
PPTX
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
PDF
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
PDF
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
PDF
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
PDF
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
PPTX
Kostas Kloudas - Extending Flink's Streaming APIs
PPTX
Fabian Hueske - Stream Analytics with SQL on Apache Flink
PPTX
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
PDF
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...
Webinar: How to contribute to Apache Flink - Robert Metzger
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: Detecting row patterns with Flink SQL - Dawid Wysakowicz
Webinar: Flink SQL in Action - Fabian Hueske
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
Stephan Ewen - Experiences running Flink at Very Large Scale
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
Aljoscha Krettek - Portable stateful big data processing in Apache Beam
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Kostas Kloudas - Extending Flink's Streaming APIs
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup

Recently uploaded (20)

PPTX
Rise of the Digital Control Grid Zeee Media and Hope and Tivon FTWProject.com
PDF
Child-friendly e-learning for artificial intelligence education in Indonesia:...
PDF
Human Computer Interaction Miterm Lesson
PDF
Addressing the challenges of harmonizing law and artificial intelligence tech...
PDF
Fitaura: AI & Machine Learning Powered Fitness Tracker
PPTX
Slides World Game (s) Great Redesign Eco Economic Epochs.pptx
PDF
Altius execution marketplace concept.pdf
PPTX
AQUEEL MUSHTAQUE FAKIH COMPUTER CENTER .
PPTX
Information-Technology-in-Human-Society.pptx
PPTX
Presentation - Principles of Instructional Design.pptx
PPTX
From Curiosity to ROI — Cost-Benefit Analysis of Agentic Automation [3/6]
PDF
TicketRoot: Event Tech Solutions Deck 2025
PDF
Revolutionizing recommendations a survey: a comprehensive exploration of mode...
PDF
CCUS-as-the-Missing-Link-to-Net-Zero_AksCurious.pdf
PDF
ELLIE29.pdfWETWETAWTAWETAETAETERTRTERTER
PPTX
Information-Technology-in-Human-Society (2).pptx
PDF
Examining Bias in AI Generated News Content.pdf
PDF
GDG Cloud Southlake #45: Patrick Debois: The Impact of GenAI on Development a...
PDF
Advancements in abstractive text summarization: a deep learning approach
PDF
FASHION-DRIVEN TEXTILES AS A CRYSTAL OF A NEW STREAM FOR STAKEHOLDER CAPITALI...
Rise of the Digital Control Grid Zeee Media and Hope and Tivon FTWProject.com
Child-friendly e-learning for artificial intelligence education in Indonesia:...
Human Computer Interaction Miterm Lesson
Addressing the challenges of harmonizing law and artificial intelligence tech...
Fitaura: AI & Machine Learning Powered Fitness Tracker
Slides World Game (s) Great Redesign Eco Economic Epochs.pptx
Altius execution marketplace concept.pdf
AQUEEL MUSHTAQUE FAKIH COMPUTER CENTER .
Information-Technology-in-Human-Society.pptx
Presentation - Principles of Instructional Design.pptx
From Curiosity to ROI — Cost-Benefit Analysis of Agentic Automation [3/6]
TicketRoot: Event Tech Solutions Deck 2025
Revolutionizing recommendations a survey: a comprehensive exploration of mode...
CCUS-as-the-Missing-Link-to-Net-Zero_AksCurious.pdf
ELLIE29.pdfWETWETAWTAWETAETAETERTRTERTER
Information-Technology-in-Human-Society (2).pptx
Examining Bias in AI Generated News Content.pdf
GDG Cloud Southlake #45: Patrick Debois: The Impact of GenAI on Development a...
Advancements in abstractive text summarization: a deep learning approach
FASHION-DRIVEN TEXTILES AS A CRYSTAL OF A NEW STREAM FOR STAKEHOLDER CAPITALI...

Deploying Flink on Kubernetes - David Anderson

  • 1. © 2019 Ververica David Anderson | @alpinegizmo | Training Coordinator Getting Started with Apache Flink® on Kubernetes
  • 2. 2 © 2019 Ververica About Ververica Creators of Apache Flink® Real Time Stream Processing for the Enterprise
  • 3. 3 © 2019 Ververica Outline 1. Introduction 2. Detailed Example 3. Debugging Tips 4. Future Plans
  • 4. 4 © 2019 Ververica Why Containers? • Containers provide isolation at low cost – Require fewer resources than VMs – Smaller, boot faster • Simpler to manage – Each container does one thing – Consistent packaging • Enables flexible and dynamic resource allocation – Scalable – Composable
  • 5. 5 © 2019 Ververica Container Orchestration with Kubernetes • Declarative configuration: – You tell K8s the desired state, and a background process makes it happen • 3 replicas of this container should be kept running • A load balancer should exist, listening on port 443, backed by container with this label • Core resource types: – Pod: a group of one or more containers – Job: keeps pod(s) running until finished – Deployment: keeps n pods running indefinitely – Service: a REST object backed by a set of pods – Persistent Volume Claim: storage whose lifetime is not coupled to any of the pods
  • 6. 6 © 2019 Ververica Vision: Flink as a Library • Makes deployments simpler – Focus is on deploying/running an application – You build one, complete job-specific Docker image that includes: • Your application code • Flink libraries • Other dependencies • Configuration files
  • 7. 7 © 2019 Ververica Flink’s Runtime Building Blocks • Cluster framework-specific • Manages available TaskManagers • Acquires / releases resources ResourceManager TaskManagerJobManager • Registers with ResourceManager • Provides “task slots” • Assigned tasks by one or more JobManagers • One per job • Schedules job in terms of "task slots" • Monitors task execution • Coordinates checkpointing Dispatcher • Touch-point for job submissions • Spawns JobManagers
  • 8. 8 © 2019 Ververica Flink’s Runtime Building Blocks • Cluster framework-specific • Manages available TaskManagers • Acquires / releases resources ResourceManager TaskManagerJobManager • Registers with ResourceManager • Provides “task slots” • Assigned tasks by JobManager(s) • One per job • Schedules job in terms of "task slots" • Monitors task execution • Coordinates checkpointing Dispatcher • Touch-point for job submissions • Spawns JobManagers
  • 9. 9 © 2019 Ververica Runtime Building Blocks (on Yarn) ResourceManager (3) Request slots TaskManager JobManager (4) Start TaskManager (5) Register (7) Deploy Tasks Dispatcher App/Client (1) Submit Job (2) Start JobManager (6) Offer slots
  • 10. 10 © 2019 Ververica But we’re not quite there yet with K8s
  • 11. 11 © 2019 Ververica Flink on K8s: current status • Still using the legacy standalone resource manager • Deployment establishes a static execution environment • You will have a k8s manifest that effectively says – there should be n taskmanagers that look like this Flink is not aware of Kubernetes
  • 12. 12 © 2019 Ververica Master Container ResourceManager JobManager Mini Dispatcher (2) Run & Start Worker Container TaskManager Worker Container TaskManager Worker Container TaskManager (3) Register (4) Deploy Tasks (0) One image is built that can be either a Master or Worker (1) Container framework starts Master & Worker Containers Flink job cluster on K8s
  • 13. 13 © 2019 Ververica 2. EXAMPLE https://github.com/alpinegizmo/flink-containers-example
  • 14. 14 © 2019 Ververica Very Simple Streaming Job https://github.com/alpinegizmo/flink-containers-example data generator RichFlatMap print # events per user keyBy
  • 15. 15 © 2019 Ververica
  • 16. 16 © 2019 Ververica Desired Runtime Landscape for K8s
  • 17. 17 © 2019 Ververica Steps 1. Build the docker image 2. Set up job cluster (k8s job) & task managers (k8s deployment) 3. Set up job cluster service 4. Add minio for checkpoints
  • 18. 18 © 2019 Ververica 1: Build a docker image ADD $flink_dist $FLINK_INSTALL_PATH ADD $job_jar $FLINK_INSTALL_PATH/job.jar . . . COPY docker/flink/flink-conf.yaml $FLINK_HOME/conf COPY docker/flink/log4j-console.properties $FLINK_HOME/conf COPY docker/flink/docker-entrypoint.sh / . . . ENTRYPOINT ["/docker-entrypoint.sh"] Dockerfile
  • 19. 19 © 2019 Ververica . . . JOB_CLUSTER="job-cluster" TASK_MANAGER="task-manager" CMD="$1" shift; if [ "${CMD}" == "${JOB_CLUSTER}" -o "${CMD}" == "${TASK_MANAGER}" ]; then if [ "${CMD}" == "${TASK_MANAGER}" ]; then exec $FLINK_HOME/bin/taskmanager.sh start-foreground "$@" else exec $FLINK_HOME/bin/standalone-job.sh start-foreground "$@" fi fi exec "$@" docker-entrypoint.sh
  • 20. 20 © 2019 Ververica apiVersion: extensions/v1beta1 kind: Deployment metadata: name: flink-task-manager spec: replicas: ${FLINK_NUM_OF_TASKMANAGERS} template: metadata: labels: app: flink component: task-manager spec: containers: - name: flink-task-manager image: ${FLINK_IMAGE_NAME} imagePullPolicy: Never args: ["task-manager", "-Djobmanager.rpc.address=flink-job-cluster"] task-manager-deployment.yaml.template apiVersion: batch/v1 kind: Job metadata: name: flink-job-cluster spec: template: metadata: labels: app: flink component: job-cluster spec: restartPolicy: OnFailure containers: - name: flink-job-cluster image: ${FLINK_IMAGE_NAME} imagePullPolicy: Never args: ["job-cluster", "-Djobmanager.rpc.address=flink-job-cluster", "-Dblob.server.port=6124", "-Dqueryable-state.server.ports=6125"] ports: - containerPort: 6123 name: rpc - containerPort: 6124 name: blob - containerPort: 6125 name: query - containerPort: 8081 name: ui job-cluster-job.yaml.template 2: K8s manifests
  • 21. 21 © 2019 Ververica task-manager-deployment.yaml.template apiVersion: batch/v1 kind: Job metadata: name: flink-job-cluster spec: template: metadata: labels: app: flink component: job-cluster spec: restartPolicy: OnFailure containers: - name: flink-job-cluster image: ${FLINK_IMAGE_NAME} imagePullPolicy: Never args: ["job-cluster", "-Djobmanager.rpc.address=flink-job-cluster", "-Dblob.server.port=6124", "-Dqueryable-state.server.ports=6125"] ports: - containerPort: 6123 name: rpc - containerPort: 6124 name: blob - containerPort: 6125 name: query - containerPort: 8081 name: ui job-cluster-job.yaml.template 2: K8s manifests apiVersion: extensions/v1beta1 kind: Deployment metadata: name: flink-task-manager spec: replicas: ${FLINK_NUM_OF_TASKMANAGERS} template: metadata: labels: app: flink component: task-manager spec: containers: - name: flink-task-manager image: ${FLINK_IMAGE_NAME} imagePullPolicy: Never args: ["task-manager", "-Djobmanager.rpc.address=flink-job-cluster"]
  • 22. 22 © 2019 Ververica apiVersion: extensions/v1beta1 kind: Deployment metadata: name: flink-task-manager spec: replicas: ${FLINK_NUM_OF_TASKMANAGERS} template: metadata: labels: app: flink component: task-manager spec: containers: - name: flink-task-manager image: ${FLINK_IMAGE_NAME} imagePullPolicy: Never args: ["task-manager", "-Djobmanager.rpc.address=flink-job-cluster"] task-manager-deployment.yaml.template apiVersion: batch/v1 kind: Job metadata: name: flink-job-cluster spec: template: metadata: labels: app: flink component: job-cluster spec: restartPolicy: OnFailure containers: - name: flink-job-cluster image: ${FLINK_IMAGE_NAME} imagePullPolicy: Never args: ["job-cluster", "-Djobmanager.rpc.address=flink-job-cluster", "-Dblob.server.port=6124", "-Dqueryable-state.server.ports=6125"] ports: - containerPort: 6123 name: rpc - containerPort: 6124 name: blob - containerPort: 6125 name: query - containerPort: 8081 name: ui job-cluster-job.yaml.template 2: K8s manifests
  • 23. 23 © 2019 Ververica
  • 24. 24 © 2019 Ververica apiVersion: v1 kind: Service metadata: name: flink-job-cluster labels: app: flink component: job-cluster spec: ports: - name: rpc port: 6123 - name: blob port: 6124 - name: query port: 6125 nodePort: 30025 - name: ui port: 8081 nodePort: 30081 type: NodePort selector: app: flink component: job-cluster 3: Expose job cluster as a service job-cluster-service.yaml internal ports external ports
  • 25. 25 © 2019 Ververica
  • 26. 26 © 2019 Ververica 4: Setup minio for checkpoints & savepoints • S3-compatible storage service • Apache License v2.0 • Lightweight, easy to setup
  • 27. 27 © 2019 Ververica apiVersion: v1 kind: PersistentVolumeClaim metadata: name: minio-pv-claim spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi minio-standalone-pvc.yaml
  • 28. 28 © 2019 Ververica minio-standalone-deployment.yaml apiVersion: extensions/v1beta1 kind: Deployment metadata: name: minio spec: strategy: type: Recreate template: metadata: labels: app: minio spec: volumes: - name: data persistentVolumeClaim: claimName: minio-pv-claim containers: - name: minio volumeMounts: - name: data mountPath: "/data" image: minio/minio:RELEASE.2019-03-13T21-59-47Z args: - server - /data env: - name: MINIO_ACCESS_KEY value: "minio" - name: MINIO_SECRET_KEY value: "minio123" ports: - containerPort: 9000 livenessProbe: httpGet: path: /minio/health/live port: 9000 initialDelaySeconds: 120 periodSeconds: 20
  • 29. 29 © 2019 Ververica apiVersion: v1 kind: Service metadata: name: minio-service spec: type: NodePort ports: - port: 9000 nodePort: 30090 selector: app: minio s3.path-style-access: true s3.endpoint: http://minio-service:9000 minio-standalone-service.yaml flink-conf.yaml
  • 30. 30 © 2019 Ververica /bin/sh -c " sleep 10; /usr/bin/mc config host add myminio http://minio-service:9000 minio minio123; /usr/bin/mc mb myminio/state; exit 0; " minio setup job state.checkpoints.dir: s3://state/checkpoints state.savepoints.dir: s3://state/savepoints s3.access-key: minio s3.secret-key: minio123 flink-conf.yaml
  • 31. 31 © 2019 Ververica A Note on Bucket Addresses • Two ways to specify buckets: – virtual-hosted style: state.minio-service:9000 – path-style: minio-service:9000/state • It’s easier to get path-style addresses working, by either using – s3.path-style-access: true (requires flink 1.8+) or by – specifying the endpoint with its IP address, rather than hostname
  • 32. 32 © 2019 Ververica
  • 33. 33 © 2019 Ververica
  • 34. 34 © 2019 Ververica Rescaling $ kubectl scale deployment -l component=task-manager --replicas=2 deployment.extensions "flink-task-manager" scaled $ flink modify 00000000000000000000000000000000 -p 8 -m localhost:30081 Modify job 00000000000000000000000000000000. Rescaled job 00000000000000000000000000000000. Its new parallelism is 8.
  • 35. 35 © 2019 Ververica 3. DEBUGGING
  • 36. 36 © 2019 Ververica . . . JOB_CLUSTER="job-cluster" TASK_MANAGER="task-manager" if [ "${CMD}" == "${JOB_CLUSTER}" -o "${CMD}" == "${TASK_MANAGER}" ]; then echo "Starting the ${CMD}" echo "config file: " && grep '^[^n#]' $FLINK_HOME/conf/flink-conf.yaml if [ "${CMD}" == "${TASK_MANAGER}" ]; then exec $FLINK_HOME/bin/taskmanager.sh start-foreground "$@" else exec $FLINK_HOME/bin/standalone-job.sh start-foreground "$@" fi fi exec "$@" docker-entrypoint.sh
  • 37. 37 © 2019 Ververica Starting the job-cluster config file: jobmanager.rpc.address: localhost jobmanager.rpc.port: 6123 jobmanager.heap.size: 1024m taskmanager.heap.size: 1024m taskmanager.numberOfTaskSlots: 4 parallelism.default: 1 high-availability: zookeeper high-availability.jobmanager.port: 6123 high-availability.storageDir: s3://highavailability/storage high-availability.zookeeper.quorum: zoo1:2181 state.backend: filesystem state.checkpoints.dir: s3://state/checkpoints state.savepoints.dir: s3://state/savepoints rest.port: 8081 zookeeper.sasl.disable: true s3.access-key: minio s3.secret-key: minio123 s3.path-style-access: true s3.endpoint: http://minio-service:9000 logs
  • 38. 38 © 2019 Ververica
  • 39. 39 © 2019 Ververica 4. FUTURE PLANS
  • 40. 40 © 2019 Ververica Tighter Integration with K8s • Active mode – Flink is aware of the cluster manager that it is running on, and interacts with it – Examples exist, e.g., FLIP-6 YARN • Reactive mode – Flink is oblivious to its environment – Flink may react to resources changes by scaling job
  • 41. 41 © 2019 Ververica Active k8s Integration K8s deployment controller Client TaskManager JobManager K8sResourceManager ApplicationMaster TaskManager (3) Submit job (1) Submit AM deployment (2) Start AM pod (4) Start JM (5) Request slots (6) Submit TM deployment (7) Start TM pod (8) Register(9) Request slots (10) Offer slots
  • 42. 42 © 2019 Ververica FLINK-9953: Active Kubernetes integration The ResourceManager can talk to Kubernetes to launch new pods
  • 43. 43 © 2019 Ververica Reactive Container Mode • Relies on external system to start/release TaskManagers, e.g., – Kubernetes Horizontal Pod Autoscaler – GCP Autoscaling – AWS Auto Scaling Group • Re-scale job as resources are added/removed (take savepoint and resume job with new parallelism automatically) • By definition works with all cluster managers Flink cluster JM TM TM ASG Start new TM if CPU% > threshold Monitor metrics, e.g, CPU% Register & offer slots Event rate over time
  • 44. 44 © 2019 Ververica FLINK-10407: Reactive container mode Re-scale job as resources are added/removed
  • 45. 45 © 2019 Ververica Summary • Flink currently supports job and session clusters on K8s • Example – https://github.com/alpinegizmo/flink-containers-example • Active K8s integration is in progress • Reactive container mode has been designed/planned • Call to action: – Umbrella tickets: FLINK-9953, FLINK-10407 – Join discussions on [email protected]
  • 46. 46 © 2019 Ververica Thank you!
  • 47. 47 © 2019 Ververica Questions?
  • 48. 48 © 2019 Ververica www.ververica.com @[email protected]