Running Spark and Flink
on Kubernetes
A Case Study of Kubernetes Operators
Athens Big Data Meetup, Nov 2019
Chaoran Yu
Lightbend Inc.
Kubernetes - de facto standard for
orchestrating containers
Kubernetes Resources
● Pod
Atomic unit of scheduling in K8s. Has its own IP address.
● Deployment
Declarative updates for Pods and ReplicaSets
● PersistentVolume
Storage abstraction. Main way to move state out of containers
● Service, Ingress, StatefulSet and much more!
Custom Resource Definition (CRD)
● Extension of the Kubernetes API
● Allows the developer to leverage the API server
● Quickly prototype new features
● Modular design. Can be updated independently of the cluster.
Operator Pattern
• The operator pattern is a way of packaging operational knowledge of an
application and make it native to Kubernetes, often by defining a CRD.
• An operator is an application-specific controller that extends the Kubernetes
API to create, configure, and manage instances of complex stateful
applications on behalf of a Kubernetes user.
OBSERVE
OBSERVE EVALUATE ACT
“Driven by declarative APIs,
actuated asynchronously by
controllers”
- CRDs Arent’s Just For Addons, KubeCon Seattle, Dec 2018
Apache Spark
Apache Spark is a scalable and fault-tolerant big data processing engine.
● Scales to thousands of nodes
● Runs on YARN, Mesos and Kubernetes
● Batch and streaming workloads
● Express your streaming computation the same way you would express a SQL
computation on static data:
○ The Spark SQL engine will take care of running it incrementally and continuously. It
updates results as streaming data continues to arrive.
○ Adds streaming SQL extensions, like event-time windows.
Spark on Kubernetes
./bin/spark-submit --master k8s://http://127.0.0.1:8001
--deploy-mode cluster --name spark-pi --class
org.apache.spark.examples.SparkPi --conf
spark.executor.instances=3 --conf
spark.kubernetes.container.image=<my-spark-image>
local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar
Spark Operator
• Open source with Apache License 2.0 at
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator.
• Defines CustomResourceDefinitions (CRDs), SparkApplication and
ScheduledSparkApplication to represent a Spark job.
• CRDs make Spark jobs native citizens in Kubernetes.
• Streamlines the creation, management and monitoring of Spark jobs.
Spark Operator: Architecture
Spark Operator Component Diagram
Spark Operator: Features
• Enables declarative Spark job specification.
• Invokes spark-submit and supports rich configuration options.
• Supports cron-like scheduled Spark jobs.
• Pod customization with mutating admission webhook.
• Automatic job re-submission upon spec update and restart upon failure.
• Supports exporting Prometheus metrics.
Spark Operator: Installation
• Helm chart available at
https://github.com/helm/charts/tree/master/incubator/sparkoperator.
• $ helm repo add incubator
http://storage.googleapis.com/kubernetes-charts-incubato
r
• $ helm install incubator/sparkoperator
Spark Operator: Job Spec
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-pi
namespace: default
spec:
type: Scala
mode: cluster
image: "gcr.io/spark-operator/spark:v2.4.4"
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: “local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar"
driver:
cores: 0.1
memory: "512m"
serviceAccount: spark
executor:
cores: 1
instances: 3
restartPolicy: OnFailure
Spark Operator: Basic Operations
• Running a Spark job
• kubectl apply -f spark-pi.yaml
• Listing all Spark jobs
• kubectl get sparkapplications
• Getting details of a Spark job (e.g. events)
• kubectl describe sparkapplication spark-pi
• Deleting a Spark job
• kubectl delete sparkapplication spark-pi
Spark Operator: State Machine
Mutating Admission Webhooks
• Mutating admission webhook is a kind of admission controller that intercepts
requests to the Kubernetes API server and modifies an object prior to the
persistence of the object. Beta in K8s v1.9+
• Spark Operator uses it to mount volumes and ConfigMaps in Spark driver and
executor pods.
Mounting ConfigMaps
• Specifying Spark configuration by mounting files such as
spark-defaults.conf, spark-env.sh, log4j.properties files as
ConfigMaps and then refer to them as .spec.sparkConfigMap in the
YAML.
• Specifying Hadoop configuration by mounting core-site.xml and
hdfs-site.xml files as ConfigMaps and then refer to them as
.spec.hadoopConfigMap in the YAML.
Mounting Volumes
• When using the Spark history server, both the driver and executor pods need
to log events to the same volume.
sparkConf:
"spark.eventLog.enabled": "true"
"spark.eventLog.dir": "file:/mnt"
volumes:
- name: spark-data
persistentVolumeClaim:
claimName: spark-hs-pvc
driver:
volumeMounts:
- name: spark-data
mountPath: /mnt
executor:
volumeMounts:
- name: spark-data
mountPath: /mnt
Job Monitoring with Prometheus
• The Spark Operator configures the Prometheus JMX exporter to run as a
Java agent.
• The Spark Operator supports emitting two sets of metrics
• Driver and executor metrics (e.g. spark_driver_appStatus_jobDuration)
• Application-level metrics (e.g. spark_app_running_count)
• To expose driver and executor metrics, the Spark application Docker image
needs to contain the Prometheus JMX exporter Java agent jar.
Enable metrics
image: "gcr.io/spark-operator/spark:v2.4.4-gcs-prometheus"
monitoring:
exposeDriverMetrics: true
exposeExecutorMetrics: true
prometheus:
jmxExporterJar: "/prometheus/jmx_prometheus_javaagent-0.11.0.jar"
port: 8090
Apache Flink
Apache Flink is an open source big data processing engine that provides the following:
● Scales to thousands of nodes.
● Runs on YARN, Mesos and Kubernetes.
● Provides checkpointing and save-pointing facilities for fault tolerance, e.g., restarting without
loss of accumulated state.
● Provides queryable state support; avoid needing an external database to expose state outside
the app.
● Provides window semantics; enables calculation of accurate aggregations, even for out-of-order
or late-arriving data.
Flink on Kubernetes
● Session Cluster
Long-running K8s Deployment. Can run multiple Flink jobs in a cluster.
Each job needs to be submitted after cluster is deployed.
● Job Cluster
Dedicated cluster that runs a single Flink job. Job jar is baked into the
image. No submission needed.
Flink on Kubernetes
Components:
● Job manager Deployment
● Task manager Deployment
● Job manager service
○ Enable job manager and task managers to talk to each other
○ Expose UI
Flink Operator
• Open source with Apache License 2.0 at
https://github.com/lyft/flinkk8soperator.
• Defines CustomResourceDefinition (CRD) FlinkApplication to represent a
Flink job.
• Uses a hybrid session-job cluster mode. A cluster is created for each single
job, which is submitted to that cluster.
Flink Operator: Architecture
Flink Operator: State Machine
Flink Operator: Job Spec
apiVersion: flink.k8s.io/v1beta1
kind: FlinkApplication
metadata:
name: wordcount-operator-example
namespace: flink-operator
spec:
image: lightbend/flink-wordcount:latest
imagePullPolicy: Always
serviceAccountName: toned-guppy-flink
flinkConfig:
taskmanager.heap.size: 200
state.backend.fs.checkpointdir: file:///checkpoints/flink/checkpoints
state.checkpoints.dir: file:///checkpoints/flink/externalized-checkpoints
state.savepoints.dir: file:///checkpoints/flink/savepoints
jobManagerConfig:
resources:
requests:
memory: "200Mi"
cpu: "0.2"
replicas: 1
taskManagerConfig:
taskSlots: 2
resources:
requests:
memory: "200Mi"
cpu: "0.2"
flinkVersion: "1.8"
jarName: "wordcount-operator-example-1.0.0-SNAPSHOT.jar"
parallelism: 3
entryClass: "org.apache.flink.WordCount"
Roll My Own Operator
Choose among the following frameworks for least-resistance path:
● kubebuilder: https://github.com/kubernetes-sigs/kubebuilder
● Operator SDK: https://github.com/operator-framework/operator-sdk
To see how things really work:
● client-go: https://github.com/kubernetes/client-go
● controller-runtime: https://github.com/kubernetes-sigs/controller-runtime/
https://github.com/lightbend/cloudflow
THANK YOU!
QUESTIONS?

More Related Content

PPTX
Ofir Makmal - Intro To Kubernetes Operators - Google Cloud Summit 2018 Tel Aviv
PDF
The Kubernetes Operator Pattern - ContainerConf Nov 2017
PDF
[Spark Summit 2017 NA] Apache Spark on Kubernetes
PDF
Machine learning with Apache Spark on Kubernetes | DevNation Tech Talk
PDF
CCICI CIP 1.0 Testbed - Security access implementation and reference - v1.0
PDF
Kubernetes Application Deployment with Helm - A beginner Guide!
PDF
Helm - Package Manager for Kubernetes
PDF
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...
Ofir Makmal - Intro To Kubernetes Operators - Google Cloud Summit 2018 Tel Aviv
The Kubernetes Operator Pattern - ContainerConf Nov 2017
[Spark Summit 2017 NA] Apache Spark on Kubernetes
Machine learning with Apache Spark on Kubernetes | DevNation Tech Talk
CCICI CIP 1.0 Testbed - Security access implementation and reference - v1.0
Kubernetes Application Deployment with Helm - A beginner Guide!
Helm - Package Manager for Kubernetes
Kubernetes Architecture | Understanding Kubernetes Components | Kubernetes Tu...

What's hot (20)

PDF
[Lakmal] Automate Microservice to API
PDF
Flink on Kubernetes operator
PPTX
Kubernetes fundamentals
PDF
An overview of the Kubernetes architecture
PDF
Building Cloud-Native Applications with Kubernetes, Helm and Kubeless
PDF
Scalable Spark deployment using Kubernetes
PPTX
The journey to the kubernetes metrics
PPTX
Introducing Kubernetes
PPTX
Kubernetes 101
PDF
Spark day 2017 - Spark on Kubernetes
PDF
From Code to Kubernetes
PDF
Crafting Kubernetes Operators
PDF
Managing Stateful Services with the Operator Pattern in Kubernetes - Kubernet...
PDF
Kubernetes for Serverless - Serverless Summit 2017 - Krishna Kumar
PDF
Managing kubernetes deployment with operators
PPTX
Intro to Helm for Kubernetes
PDF
Level-up your gaming telemetry using Kafka Streams | DevNation Tech Talk
PDF
Introduction to Kubernetes RBAC
PPTX
Kubernetes 101
PDF
Kubernetes: The Next Research Platform
[Lakmal] Automate Microservice to API
Flink on Kubernetes operator
Kubernetes fundamentals
An overview of the Kubernetes architecture
Building Cloud-Native Applications with Kubernetes, Helm and Kubeless
Scalable Spark deployment using Kubernetes
The journey to the kubernetes metrics
Introducing Kubernetes
Kubernetes 101
Spark day 2017 - Spark on Kubernetes
From Code to Kubernetes
Crafting Kubernetes Operators
Managing Stateful Services with the Operator Pattern in Kubernetes - Kubernet...
Kubernetes for Serverless - Serverless Summit 2017 - Krishna Kumar
Managing kubernetes deployment with operators
Intro to Helm for Kubernetes
Level-up your gaming telemetry using Kafka Streams | DevNation Tech Talk
Introduction to Kubernetes RBAC
Kubernetes 101
Kubernetes: The Next Research Platform
Ad

Similar to 18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes (20)

PDF
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
PDF
Kubernetes + Operator + PaaSTA = Flink @ Yelp - Antonio Verardi, Yelp
PDF
How to build a tool for operating Flink on Kubernetes
PDF
Reliable Performance at Scale with Apache Spark on Kubernetes
PDF
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
PDF
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
PDF
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
PDF
Apache Spark on K8S Best Practice and Performance in the Cloud
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
PDF
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
PPTX
How kubernetes operators can rescue dev secops in midst of a pandemic updated
PDF
Webinar kubernetes and-spark
PDF
Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdf
PPTX
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
PDF
Scaling Apache Spark on Kubernetes at Lyft
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
PPTX
Why Kubernetes as a container orchestrator is a right choice for running spar...
PDF
Big data and Kubernetes
PDF
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Flink Forward San Francisco 2019: Managing Flink on Kubernetes - FlinkK8sOper...
Kubernetes + Operator + PaaSTA = Flink @ Yelp - Antonio Verardi, Yelp
How to build a tool for operating Flink on Kubernetes
Reliable Performance at Scale with Apache Spark on Kubernetes
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Migrating Airflow-based Apache Spark Jobs to Kubernetes – the Native Way
Apache Spark on K8S Best Practice and Performance in the Cloud
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
How kubernetes operators can rescue dev secops in midst of a pandemic updated
Webinar kubernetes and-spark
Run Apache Spark on Kubernetes in Large Scale_ Challenges and Solutions-2.pdf
SF Big Analytics_20190612: Scaling Apache Spark on Kubernetes at Lyft
Scaling Apache Spark on Kubernetes at Lyft
Scaling your Data Pipelines with Apache Spark on Kubernetes
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Why Kubernetes as a container orchestrator is a right choice for running spar...
Big data and Kubernetes
Spark on Kubernetes - Advanced Spark and Tensorflow Meetup - Jan 19 2017 - An...
Ad

More from Athens Big Data (20)

PDF
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
PDF
21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system
PDF
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
PDF
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
PDF
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
PDF
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
PDF
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
PDF
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
PDF
19th Athens Big Data Meetup - 1st Talk - NLP understanding
PDF
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
PDF
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
PDF
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
PDF
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
PDF
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
PDF
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
PDF
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
PDF
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
PDF
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
PDF
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
PDF
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
19th Athens Big Data Meetup - 1st Talk - NLP understanding
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading

Recently uploaded (20)

PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
Hybrid model detection and classification of lung cancer
PDF
CloudStack 4.21: First Look Webinar slides
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
Five Habits of High-Impact Board Members
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
DOCX
search engine optimization ppt fir known well about this
PPT
What is a Computer? Input Devices /output devices
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
STKI Israel Market Study 2025 version august
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
Chapter 5: Probability Theory and Statistics
PPTX
The various Industrial Revolutions .pptx
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
DP Operators-handbook-extract for the Mautical Institute
Hybrid model detection and classification of lung cancer
CloudStack 4.21: First Look Webinar slides
O2C Customer Invoices to Receipt V15A.pptx
Five Habits of High-Impact Board Members
Assigned Numbers - 2025 - Bluetooth® Document
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Group 1 Presentation -Planning and Decision Making .pptx
search engine optimization ppt fir known well about this
What is a Computer? Input Devices /output devices
Enhancing emotion recognition model for a student engagement use case through...
Taming the Chaos: How to Turn Unstructured Data into Decisions
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
STKI Israel Market Study 2025 version august
1 - Historical Antecedents, Social Consideration.pdf
Chapter 5: Probability Theory and Statistics
The various Industrial Revolutions .pptx

18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes

  • 1. Running Spark and Flink on Kubernetes A Case Study of Kubernetes Operators Athens Big Data Meetup, Nov 2019 Chaoran Yu Lightbend Inc.
  • 2. Kubernetes - de facto standard for orchestrating containers
  • 3. Kubernetes Resources ● Pod Atomic unit of scheduling in K8s. Has its own IP address. ● Deployment Declarative updates for Pods and ReplicaSets ● PersistentVolume Storage abstraction. Main way to move state out of containers ● Service, Ingress, StatefulSet and much more!
  • 4. Custom Resource Definition (CRD) ● Extension of the Kubernetes API ● Allows the developer to leverage the API server ● Quickly prototype new features ● Modular design. Can be updated independently of the cluster.
  • 5. Operator Pattern • The operator pattern is a way of packaging operational knowledge of an application and make it native to Kubernetes, often by defining a CRD. • An operator is an application-specific controller that extends the Kubernetes API to create, configure, and manage instances of complex stateful applications on behalf of a Kubernetes user. OBSERVE OBSERVE EVALUATE ACT
  • 6. “Driven by declarative APIs, actuated asynchronously by controllers” - CRDs Arent’s Just For Addons, KubeCon Seattle, Dec 2018
  • 7. Apache Spark Apache Spark is a scalable and fault-tolerant big data processing engine. ● Scales to thousands of nodes ● Runs on YARN, Mesos and Kubernetes ● Batch and streaming workloads ● Express your streaming computation the same way you would express a SQL computation on static data: ○ The Spark SQL engine will take care of running it incrementally and continuously. It updates results as streaming data continues to arrive. ○ Adds streaming SQL extensions, like event-time windows.
  • 8. Spark on Kubernetes ./bin/spark-submit --master k8s://http://127.0.0.1:8001 --deploy-mode cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf spark.executor.instances=3 --conf spark.kubernetes.container.image=<my-spark-image> local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar
  • 9. Spark Operator • Open source with Apache License 2.0 at https://github.com/GoogleCloudPlatform/spark-on-k8s-operator. • Defines CustomResourceDefinitions (CRDs), SparkApplication and ScheduledSparkApplication to represent a Spark job. • CRDs make Spark jobs native citizens in Kubernetes. • Streamlines the creation, management and monitoring of Spark jobs.
  • 10. Spark Operator: Architecture Spark Operator Component Diagram
  • 11. Spark Operator: Features • Enables declarative Spark job specification. • Invokes spark-submit and supports rich configuration options. • Supports cron-like scheduled Spark jobs. • Pod customization with mutating admission webhook. • Automatic job re-submission upon spec update and restart upon failure. • Supports exporting Prometheus metrics.
  • 12. Spark Operator: Installation • Helm chart available at https://github.com/helm/charts/tree/master/incubator/sparkoperator. • $ helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubato r • $ helm install incubator/sparkoperator
  • 13. Spark Operator: Job Spec apiVersion: "sparkoperator.k8s.io/v1beta2" kind: SparkApplication metadata: name: spark-pi namespace: default spec: type: Scala mode: cluster image: "gcr.io/spark-operator/spark:v2.4.4" mainClass: org.apache.spark.examples.SparkPi mainApplicationFile: “local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar" driver: cores: 0.1 memory: "512m" serviceAccount: spark executor: cores: 1 instances: 3 restartPolicy: OnFailure
  • 14. Spark Operator: Basic Operations • Running a Spark job • kubectl apply -f spark-pi.yaml • Listing all Spark jobs • kubectl get sparkapplications • Getting details of a Spark job (e.g. events) • kubectl describe sparkapplication spark-pi • Deleting a Spark job • kubectl delete sparkapplication spark-pi
  • 16. Mutating Admission Webhooks • Mutating admission webhook is a kind of admission controller that intercepts requests to the Kubernetes API server and modifies an object prior to the persistence of the object. Beta in K8s v1.9+ • Spark Operator uses it to mount volumes and ConfigMaps in Spark driver and executor pods.
  • 17. Mounting ConfigMaps • Specifying Spark configuration by mounting files such as spark-defaults.conf, spark-env.sh, log4j.properties files as ConfigMaps and then refer to them as .spec.sparkConfigMap in the YAML. • Specifying Hadoop configuration by mounting core-site.xml and hdfs-site.xml files as ConfigMaps and then refer to them as .spec.hadoopConfigMap in the YAML.
  • 18. Mounting Volumes • When using the Spark history server, both the driver and executor pods need to log events to the same volume. sparkConf: "spark.eventLog.enabled": "true" "spark.eventLog.dir": "file:/mnt" volumes: - name: spark-data persistentVolumeClaim: claimName: spark-hs-pvc driver: volumeMounts: - name: spark-data mountPath: /mnt executor: volumeMounts: - name: spark-data mountPath: /mnt
  • 19. Job Monitoring with Prometheus • The Spark Operator configures the Prometheus JMX exporter to run as a Java agent. • The Spark Operator supports emitting two sets of metrics • Driver and executor metrics (e.g. spark_driver_appStatus_jobDuration) • Application-level metrics (e.g. spark_app_running_count) • To expose driver and executor metrics, the Spark application Docker image needs to contain the Prometheus JMX exporter Java agent jar.
  • 20. Enable metrics image: "gcr.io/spark-operator/spark:v2.4.4-gcs-prometheus" monitoring: exposeDriverMetrics: true exposeExecutorMetrics: true prometheus: jmxExporterJar: "/prometheus/jmx_prometheus_javaagent-0.11.0.jar" port: 8090
  • 21. Apache Flink Apache Flink is an open source big data processing engine that provides the following: ● Scales to thousands of nodes. ● Runs on YARN, Mesos and Kubernetes. ● Provides checkpointing and save-pointing facilities for fault tolerance, e.g., restarting without loss of accumulated state. ● Provides queryable state support; avoid needing an external database to expose state outside the app. ● Provides window semantics; enables calculation of accurate aggregations, even for out-of-order or late-arriving data.
  • 22. Flink on Kubernetes ● Session Cluster Long-running K8s Deployment. Can run multiple Flink jobs in a cluster. Each job needs to be submitted after cluster is deployed. ● Job Cluster Dedicated cluster that runs a single Flink job. Job jar is baked into the image. No submission needed.
  • 23. Flink on Kubernetes Components: ● Job manager Deployment ● Task manager Deployment ● Job manager service ○ Enable job manager and task managers to talk to each other ○ Expose UI
  • 24. Flink Operator • Open source with Apache License 2.0 at https://github.com/lyft/flinkk8soperator. • Defines CustomResourceDefinition (CRD) FlinkApplication to represent a Flink job. • Uses a hybrid session-job cluster mode. A cluster is created for each single job, which is submitted to that cluster.
  • 27. Flink Operator: Job Spec apiVersion: flink.k8s.io/v1beta1 kind: FlinkApplication metadata: name: wordcount-operator-example namespace: flink-operator spec: image: lightbend/flink-wordcount:latest imagePullPolicy: Always serviceAccountName: toned-guppy-flink flinkConfig: taskmanager.heap.size: 200 state.backend.fs.checkpointdir: file:///checkpoints/flink/checkpoints state.checkpoints.dir: file:///checkpoints/flink/externalized-checkpoints state.savepoints.dir: file:///checkpoints/flink/savepoints jobManagerConfig: resources: requests: memory: "200Mi" cpu: "0.2" replicas: 1 taskManagerConfig: taskSlots: 2 resources: requests: memory: "200Mi" cpu: "0.2" flinkVersion: "1.8" jarName: "wordcount-operator-example-1.0.0-SNAPSHOT.jar" parallelism: 3 entryClass: "org.apache.flink.WordCount"
  • 28. Roll My Own Operator Choose among the following frameworks for least-resistance path: ● kubebuilder: https://github.com/kubernetes-sigs/kubebuilder ● Operator SDK: https://github.com/operator-framework/operator-sdk To see how things really work: ● client-go: https://github.com/kubernetes/client-go ● controller-runtime: https://github.com/kubernetes-sigs/controller-runtime/