SlideShare a Scribd company logo
Mason Chen | Apple
Multi Cluster Kafka Source
THIS IS NOT A CONTRIBUTION
Agenda
Motivation

FLIP 27 Kafka Source

Source Design

Example
Flink Kafka Pipeline
Manual Migration Steps
Manual Migration Steps
Bring up new cluster
Manual Migration Steps
Swap producer
Manual Migration Steps
Wait for consumer to drain
Manual Migration Steps
Source uid and cluster change
Manual Migration Steps
Upgrade with non restore state
Manual Migration Steps
Increase parallelism for lag
Manual Migration Steps
Revert to steady state
Manual Migration Steps
When can we remove nonactive cluster?
User Manual Migration Steps
• Change source uid

• Change bootstrap server

• Upgrade application

• With non restore state

• Change parallelism and resources to catch with lag

• Revert to steady state when caught up
Manual Migration Steps
• Application downtime

• Need to increase system resources for catchup

• User manual toil

• User could have 100+ jobs

• Multiple hours of team coordination
Drawbacks
Scaling Multiple Kafka Clusters
• Hybrid cloud: on-prem, private cloud and public cloud providers

• Scalability

• Topic sharding

• Operability and Failover

• In place upgrade is complex and error prone
Agenda
Motivation

FLIP 27 Kafka Source

Source Design

Example
FLIP 27 Source
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/sources/
FLIP 27 Source
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/sources/
FLIP 27 Source
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/sources/
FLIP 27 Source
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/sources/
FLIP 27 Source
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/sources/
FLIP 27 Kafka Source
FLIP 27 Kafka Source
FLIP 27 Kafka Source
FLIP 27 Kafka Source
Agenda
Motivation

FLIP 27 Kafka Source

Source Design

Example
Kafka Metadata Service
• KafkaStream

• Logical abstraction to physical
clusters and topics

• describeStreams(Collection<String>
streamIds);

• Pluggable implementation

• File based configmap
Multi Cluster Kafka Source
Runtime
Multi Cluster Kafka Source
Runtime
Multi Cluster Kafka Source
Runtime
Multi Cluster Kafka Source
Runtime
Multi Cluster Kafka Source
Runtime
Multi Cluster Kafka Source
Runtime
Multi Cluster Kafka Source
Runtime
Multi Cluster Kafka Source
Runtime
Multi Cluster Kafka Source
Runtime
Multi Cluster Kafka Source
Runtime
Multi Cluster Kafka Source
Runtime
Multi Cluster Kafka Source
Runtime
Multi Cluster Kafka Source
Runtime
Extension of FLIP 27 Major Components
• Kafka Source components

• Polling, commit, checkpoint, split assignment, 

• Source Event RPC

• Enumerator Context Proxy

• Split assignment and wrapping cluster info

• Context thread pools
Agenda
Motivation

FLIP 27 Kafka Source

Source Design

Example
Migration with Multi Cluster Kafka Source
Migration with Multi Cluster Kafka Source
Initial metadata
Migration with Multi Cluster Kafka Source
Bring up new cluster
Migration with Multi Cluster Kafka Source
Bring up new cluster
Migration with Multi Cluster Kafka Source
Add new cluster metadata
Migration with Multi Cluster Kafka Source
Reconcile metadata
Migration with Multi Cluster Kafka Source
Reconcile metadata
Migration with Multi Cluster Kafka Source
Remove old cluster
Migration with Multi Cluster Kafka Source
Reconcile metadata
Migration with Multi Cluster Kafka Source
Reconcile metadata
Migration with Multi Cluster Kafka Source
Remove old cluster
User Cluster Migration Steps
Multi Cluster Kafka Source Benefits
• Migrations and failover automated transparently within source

• Simplify operations between compute and storage infra

• Hybrid Source compatible

• Can be leveraged for topic migration
Future Work
• Integrate with split level watermark alignment

• Optimizations to remove only affected readers

• FLIP-246 (https://cwiki.apache.org/confluence/display/FLINK/
FLIP-246%3A+Multi+Cluster+Kafka+Source)
Q&A

More Related Content

What's hot (20)

PPTX
Kafka replication apachecon_2013
Jun Rao
 
PDF
Fundamentals of Apache Kafka
Chhavi Parasher
 
PPTX
Extending Flink SQL for stream processing use cases
Flink Forward
 
PPTX
Building Reliable Lakehouses with Apache Flink and Delta Lake
Flink Forward
 
PDF
From Zero to Hero with Kafka Connect
confluent
 
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
PDF
Introduction to Kafka Streams
Guozhang Wang
 
PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
PDF
ksqlDB: A Stream-Relational Database System
confluent
 
PPTX
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
PPTX
Introduction to Apache Kafka
AIMDek Technologies
 
PPTX
Using Queryable State for Fun and Profit
Flink Forward
 
PDF
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Kai Wähner
 
PDF
Kafka Streams: What it is, and how to use it?
confluent
 
PDF
Kafka Connect and Streams (Concepts, Architecture, Features)
Kai Wähner
 
PDF
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Henning Jacobs
 
PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
PDF
Introducing Change Data Capture with Debezium
ChengKuan Gan
 
PDF
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
Kafka replication apachecon_2013
Jun Rao
 
Fundamentals of Apache Kafka
Chhavi Parasher
 
Extending Flink SQL for stream processing use cases
Flink Forward
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Flink Forward
 
From Zero to Hero with Kafka Connect
confluent
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Introduction to Kafka Streams
Guozhang Wang
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
ksqlDB: A Stream-Relational Database System
confluent
 
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Introduction to Apache Kafka
AIMDek Technologies
 
Using Queryable State for Fun and Profit
Flink Forward
 
Real-Life Use Cases & Architectures for Event Streaming with Apache Kafka
Kai Wähner
 
Kafka Streams: What it is, and how to use it?
confluent
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kai Wähner
 
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
Henning Jacobs
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
Introducing Change Data Capture with Debezium
ChengKuan Gan
 
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 

Similar to Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Flink Job Downtime (20)

PDF
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Flink Forward
 
PDF
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative
Natan Silnitsky
 
PDF
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Lightbend
 
PDF
Migrating to Multi Cluster Managed Kafka - DevopStars 2022
Natan Silnitsky
 
PPTX
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Flink Forward
 
PDF
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
confluent
 
PPTX
Building Stream Processing as a Service
Steven Wu
 
PDF
Disaster Recovery Plans for Apache Kafka
confluent
 
PPTX
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira
 
PDF
Migrating to Multi Cluster Managed Kafka - ApacheKafkaIL
Natan Silnitsky
 
PDF
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
PDF
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
PPTX
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
Flink Forward
 
PPTX
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
confluent
 
PDF
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
HostedbyConfluent
 
PDF
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward
 
PDF
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
confluent
 
PPTX
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
confluent
 
PDF
Apache flink
pranay kumar
 
PDF
Flink at netflix paypal speaker series
Monal Daxini
 
Running Flink in Production: The good, The bad and The in Between - Lakshmi ...
Flink Forward
 
Migrating to Multi Cluster Managed Kafka - Conf42 - CloudNative
Natan Silnitsky
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Lightbend
 
Migrating to Multi Cluster Managed Kafka - DevopStars 2022
Natan Silnitsky
 
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Flink Forward
 
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
confluent
 
Building Stream Processing as a Service
Steven Wu
 
Disaster Recovery Plans for Apache Kafka
confluent
 
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira
 
Migrating to Multi Cluster Managed Kafka - ApacheKafkaIL
Natan Silnitsky
 
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
Flink Forward San Francisco 2018: Andrew Gao & Jeff Sharpe - "Finding Bad Ac...
Flink Forward
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
confluent
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
HostedbyConfluent
 
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
confluent
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
confluent
 
Apache flink
pranay kumar
 
Flink at netflix paypal speaker series
Monal Daxini
 
Ad

More from Flink Forward (19)

PPTX
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
PPTX
Apache Flink in the Cloud-Native Era
Flink Forward
 
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
PPTX
The Current State of Table API in 2022
Flink Forward
 
PDF
Flink SQL on Pulsar made easy
Flink Forward
 
PPTX
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
PPTX
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
PDF
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
PPTX
Welcome to the Flink Community!
Flink Forward
 
PPTX
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
PDF
Changelog Stream Processing with Apache Flink
Flink Forward
 
PPTX
Large Scale Real Time Fraudulent Web Behavior Detection
Flink Forward
 
PPTX
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
PPTX
Near real-time statistical modeling and anomaly detection using Flink!
Flink Forward
 
PPTX
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Apache Flink in the Cloud-Native Era
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Welcome to the Flink Community!
Flink Forward
 
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
Changelog Stream Processing with Apache Flink
Flink Forward
 
Large Scale Real Time Fraudulent Web Behavior Detection
Flink Forward
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
Near real-time statistical modeling and anomaly detection using Flink!
Flink Forward
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
Ad

Recently uploaded (20)

PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
NASA A Researcher’s Guide to International Space Station : Fundamental Physics
Dr. PANKAJ DHUSSA
 
PDF
Survival Models: Proper Scoring Rule and Stochastic Optimization with Competi...
Paris Women in Machine Learning and Data Science
 
PDF
Home Cleaning App Development Services.pdf
V3cube
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Next Generation AI: Anticipatory Intelligence, Forecasting Inflection Points ...
dleka294658677
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Linux schedulers for fun and profit with SchedKit
Alessio Biancalana
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PPTX
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
PDF
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
NASA A Researcher’s Guide to International Space Station : Fundamental Physics
Dr. PANKAJ DHUSSA
 
Survival Models: Proper Scoring Rule and Stochastic Optimization with Competi...
Paris Women in Machine Learning and Data Science
 
Home Cleaning App Development Services.pdf
V3cube
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Next Generation AI: Anticipatory Intelligence, Forecasting Inflection Points ...
dleka294658677
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Linux schedulers for fun and profit with SchedKit
Alessio Biancalana
 
Digital Circuits, important subject in CS
contactparinay1
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 

Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Flink Job Downtime