SlideShare a Scribd company logo
Jun Rao
Confluent, Inc
Building Large-Scale Stream Infrastructures
Across Multiple Data Centers with Apache Kafka
Outline
• Kafka overview
• Common multi data center patterns
• Future stuff
What’s Apache Kafka
Distributed, high throughput pub/sub system
Kafka usage
Common use case
• Large scale real time data integration
Other use cases
• Scaling databases
• Messaging
• Stream processing
• …
Why multiple data centers (DC)
• Disaster recovery
• Geo-localization
• Saving cross-DC bandwidth
• Security
What’s unique with Kafka multi DC
• Consumers run continuous and have states (offsets)
• Challenge: recovering the states during DC failover
Pattern #1: stretched cluster
• Typically done on AWS in a single region
• Deploy Zookeeper and broker across 3 availability zones
• Rely on intra-cluster replication to replica data across DCs
Kafka
producers
consumer
s
DC 1 DC 3DC 2
producersproducers
consumer
s
consumer
s
On DC failure
Kafka
producers
consumer
s
DC 1 DC 3DC 2
producers
consumer
s
• Producer/consumer fail over to new DCs
• Existing data preserved by intra-cluster replication
• Consumer resumes from last committed offsets and will see same
data
When DC comes back
• Intra cluster replication auto re-replicates all missing data
• When re-replication completes, switch producer/consumer
back
Kafka
producers
consumer
s
DC 1 DC 3DC 2
producersproducers
consumer
s
consumer
s
Be careful with replica assignment
• Don’t want all replicas in same AZ
• Rack-aware support in 0.10.0
• Configure brokers in same AZ with same broker.rack
• Manual replica assignment pre 0.10.0
Stretched cluster NOT recommended
across regions
• Asymmetric network partitioning
• Longer network latency => longer produce/consume time
• Across region bandwidth: no read affinity in Kafka
region 1
Kafk
a
ZK
region 2
Kafk
a
ZK
region 3
Kafk
a
ZK
Pattern #2: active/passive
• Producers in active DC
• Consumers in either active or passive DC
Kafka
producers
consumer
s
DC 1
MirrorMaker
DC 2
Kafka
consumer
s
What’s MirrorMaker
• Read from a source cluster and write to a target cluster
• Per-key ordering preserved
• Asynchronous: target always slightly behind
• Offsets not preserved
• Source and target may not have same # partitions
• Retries for failed writes
On active DC failure
• Fail over producers/consumers to passive cluster
• Challenge: which offset to resume consumption
• Offsets not identical across clusters
Kafka
producers
consumer
s
DC 1
MirrorMaker
DC 2
Kafka
Solutions for switching consumers
• Resume from smallest offset
• Duplicates
• Resume from largest offset
• May miss some messages (likely acceptable for real time
consumers)
• Set offset based on timestamp
• Current api hard to use and not precise
• Better and more precise api being worked on (KIP-33, KIP-79)
• Preserve offsets during mirroring
• Harder to do
• No timeline yet
When DC comes back
• Need to reverse mirroring
• Similar challenge for determining the offsets in MirrorMaker
Kafka
producers
consumer
s
DC 1
MirrorMaker
DC 2
Kafka
Limitations
• MirrorMaker reconfiguration after failover
• Resources in passive DC under utilized
Pattern #3: active/active
• Local  aggregate mirroring to avoid cycles
• Producers/consumers in both DCs
• Producers only write to local clusters
Kafka
local
Kafka
aggregat
e
Kafka
aggregat
e
producers producer
s
consumer
s
consumer
s
MirrorMaker
Kafka
local
DC 1 DC 2
consumer
s
consumer
s
On DC failure
• Same challenge on moving consumers on aggregate cluster
• Offsets in the 2 aggregate cluster not identical
Kafka
local
Kafka
aggregat
e
Kafka
aggregat
e
producers producer
s
consumer
s
consumer
s
MirrorMaker
Kafka
local
DC 1 DC 2
consumer
s
consumer
s
When DC comes back
• No need to reconfigure MirrorMaker
Kafka
local
Kafka
aggregat
e
Kafka
aggregat
e
producers producer
s
consumer
s
consumer
s
MirrorMaker
Kafka
local
DC 1 DC 2
consumer
s
consumer
s
Beyond 2 DCs
• More DCs  better resource utilization
• With 2 DCs, each DC needs to provision 100% traffic
• With 3 DCs, each DC only needs to provision 50% traffic
• Setting up MirrorMaker with many DCs can be daunting
• Only set up aggregate clusters in 2-3
Comparison
Pros Cons
Stretched • Better utilization of resources
• Easy failover for consumers
• Still need cross region story
Active/passive • Needed for global ordering • Harder failover for consumers
• Reconfiguration during failover
• Resource under-utilization
Active/active • Better utilization of resources • Harder failover for consumers
• Extra aggregate clusters
Multi-DC beyond Kafka
• Kafka often used together with other data stores
• Need to make sure multi-DC strategy is consistent
Example application
• Consumer reads from Kafka and computes 1-min count
• Counts need to be stored in DB and available in every DC
Independent database per DC
• Run same consumer concurrently in both DCs
• No consumer failover needed
Kafka
local
Kafka
aggregat
e
Kafka
aggregat
e
producers producer
s
consumer consumer
MirrorMaker
Kafka
local
DC 1 DC 2
DB DB
Stretched database across DCs
• Only run one consumer per DC at any given point of time
Kafka
local
Kafka
aggregat
e
Kafka
aggregat
e
producers producer
s
consumer consumer
MirrorMaker
Kafka
local
DC 1 DC 2
DB DB
on
failover
Other considerations
• Enable SSL in MirrorMaker
• Encrypt data transfer across DCs
• Performance tuning
• Running multiple instances of MirrorMaker
• May want to use RoundRobin partition assignment for more parallelism
• Tuning socket buffer size to amortize long network latency
• Where to run MirrorMaker
• Prefer close to target cluster
Future work
• KIP-33, KIP-79: timestamp index
• Allow consumers to seek based on timestamp
• Integration with Kafka Connect for data ingestion
• Offset preserving mirroring
31Confidential
THANK YOU!
Jun Rao| jun@confluent.io | @junrao
Kafka Training with Confluent University
• Kafka Developer and Operations Courses
• Visit www.confluent.io/training
Want more Kafka?
• Download Confluent Platform Enterprise at http://www.confluent.io/product
• Apache Kafka 0.10 upgrade documentation at
http://docs.confluent.io/3.0.0/upgrade.html
• Kafka Summit recordings now available at http://kafka-summit.org/schedule/

More Related Content

What's hot (20)

PDF
Kafka 101 and Developer Best Practices
confluent
 
PDF
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
SANG WON PARK
 
PDF
Ceph and RocksDB
Sage Weil
 
PPTX
Presentation upgrade, migrate & consolidate to oracle database 12c &amp...
solarisyougood
 
PDF
Nick Fisk - low latency Ceph
ShapeBlue
 
PDF
MAA for Oracle Database, Exadata and the Cloud
Markus Michalewicz
 
PDF
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
C4Media
 
PDF
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
DataWorks Summit
 
PDF
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
PPTX
Introduction to Kafka Cruise Control
Jiangjie Qin
 
PPTX
Free Training: How to Build a Lakehouse
Databricks
 
PDF
HA, Scalability, DR & MAA in Oracle Database 21c - Overview
Markus Michalewicz
 
PDF
Apache Kafka - Martin Podval
Martin Podval
 
PPTX
The columnar roadmap: Apache Parquet and Apache Arrow
DataWorks Summit
 
PDF
Container Performance Analysis
Brendan Gregg
 
PPTX
Log management with ELK
Geert Pante
 
PPTX
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
 
PPTX
ceph optimization on ssd ilsoo byun-short
NAVER D2
 
PPSX
Introducing the eDB360 Tool
Carlos Sierra
 
PPTX
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Flink Forward
 
Kafka 101 and Developer Best Practices
confluent
 
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
SANG WON PARK
 
Ceph and RocksDB
Sage Weil
 
Presentation upgrade, migrate & consolidate to oracle database 12c &amp...
solarisyougood
 
Nick Fisk - low latency Ceph
ShapeBlue
 
MAA for Oracle Database, Exadata and the Cloud
Markus Michalewicz
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
C4Media
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
DataWorks Summit
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
Introduction to Kafka Cruise Control
Jiangjie Qin
 
Free Training: How to Build a Lakehouse
Databricks
 
HA, Scalability, DR & MAA in Oracle Database 21c - Overview
Markus Michalewicz
 
Apache Kafka - Martin Podval
Martin Podval
 
The columnar roadmap: Apache Parquet and Apache Arrow
DataWorks Summit
 
Container Performance Analysis
Brendan Gregg
 
Log management with ELK
Geert Pante
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
 
ceph optimization on ssd ilsoo byun-short
NAVER D2
 
Introducing the eDB360 Tool
Carlos Sierra
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Flink Forward
 

Viewers also liked (20)

PPTX
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
confluent
 
PPTX
Streaming in Practice - Putting Apache Kafka in Production
confluent
 
PDF
A Practical Guide to Selecting a Stream Processing Technology
confluent
 
PDF
Demystifying Stream Processing with Apache Kafka
confluent
 
PDF
Power of the Log: LSM & Append Only Data Structures
confluent
 
PPTX
Deep Dive into Apache Kafka
confluent
 
PDF
Data integration with Apache Kafka
confluent
 
PPTX
Introduction To Streaming Data and Stream Processing with Apache Kafka
confluent
 
PDF
Leveraging Mainframe Data for Modern Analytics
confluent
 
PDF
Apache kafka-a distributed streaming platform
confluent
 
PPTX
Data Streaming with Apache Kafka & MongoDB
confluent
 
PDF
Monitoring Apache Kafka with Confluent Control Center
confluent
 
PDF
Introducing Kafka's Streams API
confluent
 
PDF
What's new in Confluent 3.2 and Apache Kafka 0.10.2
confluent
 
PPTX
Building an Event-oriented Data Platform with Kafka, Eric Sammer
confluent
 
PDF
Securing Kafka
confluent
 
PPTX
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
confluent
 
PDF
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
confluent
 
PPTX
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
DataWorks Summit/Hadoop Summit
 
PPTX
Building a real-time streaming platform using Kafka Connect + Kafka Streams
confluent
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
confluent
 
Streaming in Practice - Putting Apache Kafka in Production
confluent
 
A Practical Guide to Selecting a Stream Processing Technology
confluent
 
Demystifying Stream Processing with Apache Kafka
confluent
 
Power of the Log: LSM & Append Only Data Structures
confluent
 
Deep Dive into Apache Kafka
confluent
 
Data integration with Apache Kafka
confluent
 
Introduction To Streaming Data and Stream Processing with Apache Kafka
confluent
 
Leveraging Mainframe Data for Modern Analytics
confluent
 
Apache kafka-a distributed streaming platform
confluent
 
Data Streaming with Apache Kafka & MongoDB
confluent
 
Monitoring Apache Kafka with Confluent Control Center
confluent
 
Introducing Kafka's Streams API
confluent
 
What's new in Confluent 3.2 and Apache Kafka 0.10.2
confluent
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
confluent
 
Securing Kafka
confluent
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
confluent
 
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
confluent
 
Building Large-Scale Stream Infrastructures Across Multiple Data Centers with...
DataWorks Summit/Hadoop Summit
 
Building a real-time streaming platform using Kafka Connect + Kafka Streams
confluent
 
Ad

Similar to Building Large-Scale Stream Infrastructures Across Multiple Data Centers with Apache Kafka (20)

PPTX
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira
 
PDF
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
confluent
 
PDF
Disaster Recovery Plans for Apache Kafka
confluent
 
PDF
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
PPTX
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Gwen (Chen) Shapira
 
PDF
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
HostedbyConfluent
 
PDF
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Allen (Xiaozhong) Wang
 
PDF
Availability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBM
HostedbyConfluent
 
PDF
Capital One Delivers Risk Insights in Real Time with Stream Processing
confluent
 
PDF
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
confluent
 
PDF
Implementing Domain Events with Kafka
Andrei Rugina
 
PDF
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Kai Wähner
 
PPTX
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
HostedbyConfluent
 
PDF
Common Patterns of Multi Data-Center Architectures with Apache Kafka
confluent
 
PPTX
Beyond the Brokers | Emma Humber and Andrew Borley, IBM
HostedbyConfluent
 
PDF
Fan-out, fan-in & the multiplexer: Replication recipes for global platform di...
HostedbyConfluent
 
PDF
Etl, esb, mq? no! es Apache Kafka®
confluent
 
PDF
Kafka used at scale to deliver real-time notifications
Sérgio Nunes
 
PPTX
Tuning kafka pipelines
Sumant Tambe
 
PDF
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira
 
Kafka Summit SF 2017 - One Data Center is Not Enough: Scaling Apache Kafka Ac...
confluent
 
Disaster Recovery Plans for Apache Kafka
confluent
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Guozhang Wang
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Gwen (Chen) Shapira
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
HostedbyConfluent
 
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Allen (Xiaozhong) Wang
 
Availability of Kafka - Beyond the Brokers | Andrew Borley and Emma Humber, IBM
HostedbyConfluent
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
confluent
 
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
confluent
 
Implementing Domain Events with Kafka
Andrei Rugina
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Kai Wähner
 
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
HostedbyConfluent
 
Common Patterns of Multi Data-Center Architectures with Apache Kafka
confluent
 
Beyond the Brokers | Emma Humber and Andrew Borley, IBM
HostedbyConfluent
 
Fan-out, fan-in & the multiplexer: Replication recipes for global platform di...
HostedbyConfluent
 
Etl, esb, mq? no! es Apache Kafka®
confluent
 
Kafka used at scale to deliver real-time notifications
Sérgio Nunes
 
Tuning kafka pipelines
Sumant Tambe
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
PDF
Migration, backup and restore made easy using Kannika
confluent
 
PDF
Five Things You Need to Know About Data Streaming in 2025
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
PDF
Unlocking value with event-driven architecture by Confluent
confluent
 
PDF
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
PDF
Building API data products on top of your real-time data infrastructure
confluent
 
PDF
Speed Wins: From Kafka to APIs in Minutes
confluent
 
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 

Recently uploaded (20)

PDF
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
PPTX
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
PDF
Letasoft Sound Booster 1.12.0.538 Crack Download+ Product Key [Latest]
HyperPc soft
 
DOCX
Import Data Form Excel to Tally Services
Tally xperts
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PDF
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
PPTX
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PPT
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PDF
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
Letasoft Sound Booster 1.12.0.538 Crack Download+ Product Key [Latest]
HyperPc soft
 
Import Data Form Excel to Tally Services
Tally xperts
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 

Building Large-Scale Stream Infrastructures Across Multiple Data Centers with Apache Kafka

  • 1. Jun Rao Confluent, Inc Building Large-Scale Stream Infrastructures Across Multiple Data Centers with Apache Kafka
  • 2. Outline • Kafka overview • Common multi data center patterns • Future stuff
  • 3. What’s Apache Kafka Distributed, high throughput pub/sub system
  • 5. Common use case • Large scale real time data integration
  • 6. Other use cases • Scaling databases • Messaging • Stream processing • …
  • 7. Why multiple data centers (DC) • Disaster recovery • Geo-localization • Saving cross-DC bandwidth • Security
  • 8. What’s unique with Kafka multi DC • Consumers run continuous and have states (offsets) • Challenge: recovering the states during DC failover
  • 9. Pattern #1: stretched cluster • Typically done on AWS in a single region • Deploy Zookeeper and broker across 3 availability zones • Rely on intra-cluster replication to replica data across DCs Kafka producers consumer s DC 1 DC 3DC 2 producersproducers consumer s consumer s
  • 10. On DC failure Kafka producers consumer s DC 1 DC 3DC 2 producers consumer s • Producer/consumer fail over to new DCs • Existing data preserved by intra-cluster replication • Consumer resumes from last committed offsets and will see same data
  • 11. When DC comes back • Intra cluster replication auto re-replicates all missing data • When re-replication completes, switch producer/consumer back Kafka producers consumer s DC 1 DC 3DC 2 producersproducers consumer s consumer s
  • 12. Be careful with replica assignment • Don’t want all replicas in same AZ • Rack-aware support in 0.10.0 • Configure brokers in same AZ with same broker.rack • Manual replica assignment pre 0.10.0
  • 13. Stretched cluster NOT recommended across regions • Asymmetric network partitioning • Longer network latency => longer produce/consume time • Across region bandwidth: no read affinity in Kafka region 1 Kafk a ZK region 2 Kafk a ZK region 3 Kafk a ZK
  • 14. Pattern #2: active/passive • Producers in active DC • Consumers in either active or passive DC Kafka producers consumer s DC 1 MirrorMaker DC 2 Kafka consumer s
  • 15. What’s MirrorMaker • Read from a source cluster and write to a target cluster • Per-key ordering preserved • Asynchronous: target always slightly behind • Offsets not preserved • Source and target may not have same # partitions • Retries for failed writes
  • 16. On active DC failure • Fail over producers/consumers to passive cluster • Challenge: which offset to resume consumption • Offsets not identical across clusters Kafka producers consumer s DC 1 MirrorMaker DC 2 Kafka
  • 17. Solutions for switching consumers • Resume from smallest offset • Duplicates • Resume from largest offset • May miss some messages (likely acceptable for real time consumers) • Set offset based on timestamp • Current api hard to use and not precise • Better and more precise api being worked on (KIP-33, KIP-79) • Preserve offsets during mirroring • Harder to do • No timeline yet
  • 18. When DC comes back • Need to reverse mirroring • Similar challenge for determining the offsets in MirrorMaker Kafka producers consumer s DC 1 MirrorMaker DC 2 Kafka
  • 19. Limitations • MirrorMaker reconfiguration after failover • Resources in passive DC under utilized
  • 20. Pattern #3: active/active • Local  aggregate mirroring to avoid cycles • Producers/consumers in both DCs • Producers only write to local clusters Kafka local Kafka aggregat e Kafka aggregat e producers producer s consumer s consumer s MirrorMaker Kafka local DC 1 DC 2 consumer s consumer s
  • 21. On DC failure • Same challenge on moving consumers on aggregate cluster • Offsets in the 2 aggregate cluster not identical Kafka local Kafka aggregat e Kafka aggregat e producers producer s consumer s consumer s MirrorMaker Kafka local DC 1 DC 2 consumer s consumer s
  • 22. When DC comes back • No need to reconfigure MirrorMaker Kafka local Kafka aggregat e Kafka aggregat e producers producer s consumer s consumer s MirrorMaker Kafka local DC 1 DC 2 consumer s consumer s
  • 23. Beyond 2 DCs • More DCs  better resource utilization • With 2 DCs, each DC needs to provision 100% traffic • With 3 DCs, each DC only needs to provision 50% traffic • Setting up MirrorMaker with many DCs can be daunting • Only set up aggregate clusters in 2-3
  • 24. Comparison Pros Cons Stretched • Better utilization of resources • Easy failover for consumers • Still need cross region story Active/passive • Needed for global ordering • Harder failover for consumers • Reconfiguration during failover • Resource under-utilization Active/active • Better utilization of resources • Harder failover for consumers • Extra aggregate clusters
  • 25. Multi-DC beyond Kafka • Kafka often used together with other data stores • Need to make sure multi-DC strategy is consistent
  • 26. Example application • Consumer reads from Kafka and computes 1-min count • Counts need to be stored in DB and available in every DC
  • 27. Independent database per DC • Run same consumer concurrently in both DCs • No consumer failover needed Kafka local Kafka aggregat e Kafka aggregat e producers producer s consumer consumer MirrorMaker Kafka local DC 1 DC 2 DB DB
  • 28. Stretched database across DCs • Only run one consumer per DC at any given point of time Kafka local Kafka aggregat e Kafka aggregat e producers producer s consumer consumer MirrorMaker Kafka local DC 1 DC 2 DB DB on failover
  • 29. Other considerations • Enable SSL in MirrorMaker • Encrypt data transfer across DCs • Performance tuning • Running multiple instances of MirrorMaker • May want to use RoundRobin partition assignment for more parallelism • Tuning socket buffer size to amortize long network latency • Where to run MirrorMaker • Prefer close to target cluster
  • 30. Future work • KIP-33, KIP-79: timestamp index • Allow consumers to seek based on timestamp • Integration with Kafka Connect for data ingestion • Offset preserving mirroring
  • 31. 31Confidential THANK YOU! Jun Rao| [email protected] | @junrao Kafka Training with Confluent University • Kafka Developer and Operations Courses • Visit www.confluent.io/training Want more Kafka? • Download Confluent Platform Enterprise at http://www.confluent.io/product • Apache Kafka 0.10 upgrade documentation at http://docs.confluent.io/3.0.0/upgrade.html • Kafka Summit recordings now available at http://kafka-summit.org/schedule/

Editor's Notes

  • #4: New theme. Picture/logo