SlideShare a Scribd company logo
Apache Kafka lessons learned @PAYBACK
Munich, 2017
https://quotefancy.com/
3
PAYBACK Global – One Global Platform for 3 markets…
Apache Kafka Lessons Learned @ PAYBACK
Monolithic CORE
3 tier JEE
+
Configuration
> 100 Million Customers
…
14
Tage
24
h
Produkt-
Backlog
Sprint-
Backlog
Sprint Runnable
Software
deploy
SIT
UAT
Partner
test
Staging / NFR Test
Transition Go-Live
Monitoring
> 30 environments
> 200 server
> 100 artefacts
Monthly Major Release
4
Architecture Blueprint
Apache Kafka Lessons Learned @ PAYBACK
5
Our Use Case
sorry, it's not big data (yet)
Apache Kafka Lessons Learned @ PAYBACK
6
Orchestration vs. Choreography – Business Process
Apache Kafka Lessons Learned @ PAYBACK
Sam Newman 2015, Building Microservicess, O'Reilly
7
Orchestration: Synchronous
Apache Kafka Lessons Learned @ PAYBACK
Sam Newman 2015, Building Microservicess, O'Reilly
• Easy to map code to business process
• Immediate Feedback about every stage
• Atomic Execution
• Customer Service becomes central place of logic
• Leads to "God" Services
• Tight coupling, high cost of changes
• Resilience is complex (think retries, scaling…)
PRO CON
8
Choreography: Asynchronous, Event-driven
Apache Kafka Lessons Learned @ PAYBACK
*Sam Newman 2015, Building Microservicess, O'Reilly
• Easier to achieve Resilience and Performance
• More decoupled
• distributed logic
• Higher flexibility (changes, scaling)
• Higher implementation effort & complexity
• Additional work for monitoring and tracking
• Additional SPOF
PRO CON
9
Resilience concerns the whole system
Lose coupling helps implement resilience patterns, but you need to care about:
○ delivery and processing semantics
○ retries and fallback strategy
○ handle timeouts and other communication errors
○ transaction handling
○ no silver bullet pattern for all event types
Apache Kafka Lessons Learned @ PAYBACK
Resilience is about an ability to fully recover from failure - to self-heal
10
Choosing the right tool
○ NFRs may be specific to the Event type
○ Delivery semantic depend on Event type
- at most once
- at least once
- exactly once
○ Events order for some use cases can be important (FIFO)
○ Reprocessing must be possible
○ Monitoring and alerting must be well supported (APIs) due to the increased complexity
○ …
Apache Kafka Lessons Learned @ PAYBACK
We need to consider
11
Apache Kafka Lessons Learned @ PAYBACK
pub&sub, high throughput, low latency, scalable, centralized, real-time
12
I have a joke about an event…
Apache Kafka Lessons Learned @ PAYBACK
…But you might not get it
INCIDENTS
13
Cluster outage
Apache Kafka Lessons Learned @ PAYBACK
-VMs stalled during snapshot backups leading to Cluster reconnects
-in 9/10 cases recovery worked
-in 1/10 cases this lead to a single broker outside the cluster which still had
partitions assigned (luckily refused writes because of missing replicas)
Deactivate Backups!
Consider physical machines!
14
"A first sign of the beginning of
understanding is the wish to die. "
Franz Kafka
Apache Kafka Lessons Learned @ PAYBACK
Von Atelier Jacobi: Sigismund Jacobi (1860–1935) - http://www.bodleian.ox.ac.uk/news/2008_july_02, Gemeinfrei, https://commons.wikimedia.org/w/index.php?curid=5428566
15
Configuration and
implementation
is complex
Apache Kafka Lessons Learned @ PAYBACK
16
Producer
○ Producer uses non-blocking async API
○ Tow options for checking for failures:
- Immediately block for response: send().get()
- Do followup work in Callback
- Be careful about handling failures
○ Don’t forget to close the producer! producer.close() will block until in-flight transactions complete
○ acks – set to all
○ batch.size – set to 0
○ retries (defaults to 0) - think about increasing this value
- Not all errors are automatically retriable . Think about custom error handling on producer side!
- retry may affect message ordering
Apache Kafka Lessons Learned @ PAYBACK
Implementation
Configuration
17
Consumer
o Note: Consumer is single threaded – one consumer per thread
o disable auto commit (autocommit.enable = false)
o commit using OffsetAndMetadata and not committing everything
o rollback with seek -> you need to know your last committed message -> implement Rebalance Listener
o rollback (seek) after errors in offset commit
o change default max.partition.fetch.bytes (1MB can lead to session timeout in < 0.10.X)
o event processing should be idempotent – be prepared to handle duplicates
o think about event reprocessing (how to change offset, how to recreate event etc)
Apache Kafka Lessons Learned @ PAYBACK
Recommendations
18
Other basic configuration
o Acks = all
o Block.on.buffer.full = true
o Producer Retries = MAX_INT
o ( Max.inflight.requests.per.connect = 1 )
o Producer.close()
o Replication-factor >= 3
o Min.insync.replicas = 2
o Unclean.leader.election = false
o Auto.offset.commit = false
o Commit after processing
o Monitor!
Apache Kafka Lessons Learned @ PAYBACK
Be Safe, Not Sorry
19
Monitoring
Apache Kafka Lessons Learned @ PAYBACK
http://www.spiegel.de/spiegel/print/d-129456859.html
20
KafkaBrokerKafkaBroker
KafkaBroker
Timeseries Metrics to Graphite
Apache Kafka Lessons Learned @ PAYBACK
Metrics Library +
Graphite Reporter
Graphite
Grafana
KafkaConsumer
Metrics Library +
Graphite Reporter
21
Kafka-Manager: Open Source UI/API Kafka Mgmt Tool
Apache Kafka Lessons Learned @ PAYBACK
https://github.com/yahoo/kafka-manager
• Good for current cluster status and
ad-hoc analysis
• Provides a status API (HTTP)
• Consumers only displayed during
active consumption
• 0.10.x support still not merged
22
Kafka-Manager API Example
Apache Kafka Lessons Learned @ PAYBACK
curl –XGET http://kafka-manager/api/status/VP2/mdeAppGroup/groupSummary?consumerType=KF
{memberDataChanges:
{totalLag: 142,
percentageCovered: 100,
partitionOffsets:
[1779279,
372957,
368100,
372415,
368349,
374649,
373262,
373934,
1775065,
373339,
369416,
374362,
[…]
23
Burrow: API only Consumer Lag Checking
Apache Kafka Lessons Learned @ PAYBACK
{
error: false,
message: "consumer group status returned",
status: {
cluster: "vp2",
group: "mdeAppGroup",
status: "ERR",
complete: false,
partitions: [
{
topic: "memberDataChanges",
partition: 1,
status: "STOP",
start: {
offset: 1775109,
timestamp: 1485253978439,
lag: 0
},
end: {
offset: 1775127,
timestamp: 1485254054861,
lag: 1
}
},},
[…]
totallag: 8
},
request:
{url: "/v2/kafka/vp2/consumer/mdeAppGroup/lag",
host: "hqiqlpxxap89",
cluster: "vp2",
group: "mdeAppGroup",
topic: ""
}
}
curl –XGET http://burrow/v2/kafka/vp2/consumer/mdeAppGroup/lag
No Thresholds required
Alerting via email and HTTP POST
Issue: Calculate lag at request time, not commit time
24
"God gives the nuts, but he does not crack them."
Franz Kafka
PAYBACK GmbH
Maxim Schelest
Thomas Falkenberg
Theresienhöhe 12
80339 München
Phone +49 (0) 89 997 41 – 0
PAYBACK.net | PAYBACK.de

More Related Content

What's hot (20)

PDF
The Many Faces of Apache Kafka: Leveraging real-time data at scale
Neha Narkhede
 
PDF
Intro to AsyncAPI
confluent
 
PPTX
Capture the Streams of Database Changes
confluent
 
PPTX
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
confluent
 
PDF
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
confluent
 
PDF
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Keigo Suda
 
PDF
Real-world Streaming Architectures
confluent
 
PDF
Introduction to apache kafka, confluent and why they matter
Paolo Castagna
 
PPTX
Portable Streaming Pipelines with Apache Beam
confluent
 
PDF
Integrating Apache Kafka Into Your Environment
confluent
 
PDF
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
HostedbyConfluent
 
PDF
Data Pipelines Made Simple with Apache Kafka
confluent
 
PDF
Apache kafka-a distributed streaming platform
confluent
 
PDF
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
HostedbyConfluent
 
PDF
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
PDF
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
confluent
 
PDF
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
confluent
 
PDF
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
confluent
 
PDF
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
Kai Wähner
 
PPTX
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
confluent
 
The Many Faces of Apache Kafka: Leveraging real-time data at scale
Neha Narkhede
 
Intro to AsyncAPI
confluent
 
Capture the Streams of Database Changes
confluent
 
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
confluent
 
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
confluent
 
Apache Kafka & Kafka Connectを に使ったデータ連携パターン(改めETLの実装)
Keigo Suda
 
Real-world Streaming Architectures
confluent
 
Introduction to apache kafka, confluent and why they matter
Paolo Castagna
 
Portable Streaming Pipelines with Apache Beam
confluent
 
Integrating Apache Kafka Into Your Environment
confluent
 
Camel Kafka Connectors: Tune Kafka to “Speak” with (Almost) Everything (Andre...
HostedbyConfluent
 
Data Pipelines Made Simple with Apache Kafka
confluent
 
Apache kafka-a distributed streaming platform
confluent
 
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
HostedbyConfluent
 
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
confluent
 
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
confluent
 
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
confluent
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
Kai Wähner
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
confluent
 

Viewers also liked (20)

PPTX
Production ready big ml workflows from zero to hero daniel marcous @ waze
Ido Shilon
 
PDF
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Anyscale
 
PDF
What's new in Confluent 3.2 and Apache Kafka 0.10.2
confluent
 
PDF
A Deep Dive into Structured Streaming in Apache Spark
Anyscale
 
PDF
Building Kafka-powered Activity Stream
Oleksiy Holubyev
 
PPTX
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lightbend
 
PPTX
Introduction to Apache Kafka
Jeff Holoman
 
PPTX
Apache kafka
Jemin Patel
 
PPTX
Distributed Stream Processing with Apache Kafka
Jay Kreps
 
PDF
London Apache Kafka Meetup (Jan 2017)
Landoop Ltd
 
PDF
Building an IoT Kafka Pipeline in Under 5 Minutes
SingleStore
 
PDF
Confluent kafka meetupseattle jan2017
Nitin Kumar
 
PDF
Extracting Insights from Data at Twitter
Prasad Wagle
 
PPTX
Apache Spark and Oracle Stream Analytics
Prabhu Thukkaram
 
PDF
Dive into Spark Streaming
Gerard Maas
 
PPTX
Kafka presentation
Mohammed Fazuluddin
 
PDF
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Lightbend
 
PPTX
Double Your Hadoop Hardware Performance with SmartSense
Hortonworks
 
PDF
Use Cases for Elastic Search Percolator
Maxim Shelest
 
PPTX
Kafka & Couchbase Integration Patterns
Manuel Hurtado
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Ido Shilon
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Anyscale
 
What's new in Confluent 3.2 and Apache Kafka 0.10.2
confluent
 
A Deep Dive into Structured Streaming in Apache Spark
Anyscale
 
Building Kafka-powered Activity Stream
Oleksiy Holubyev
 
Lessons Learned From PayPal: Implementing Back-Pressure With Akka Streams And...
Lightbend
 
Introduction to Apache Kafka
Jeff Holoman
 
Apache kafka
Jemin Patel
 
Distributed Stream Processing with Apache Kafka
Jay Kreps
 
London Apache Kafka Meetup (Jan 2017)
Landoop Ltd
 
Building an IoT Kafka Pipeline in Under 5 Minutes
SingleStore
 
Confluent kafka meetupseattle jan2017
Nitin Kumar
 
Extracting Insights from Data at Twitter
Prasad Wagle
 
Apache Spark and Oracle Stream Analytics
Prabhu Thukkaram
 
Dive into Spark Streaming
Gerard Maas
 
Kafka presentation
Mohammed Fazuluddin
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Lightbend
 
Double Your Hadoop Hardware Performance with SmartSense
Hortonworks
 
Use Cases for Elastic Search Percolator
Maxim Shelest
 
Kafka & Couchbase Integration Patterns
Manuel Hurtado
 
Ad

Similar to Apache Kafka lessons learned @PAYBACK (20)

PDF
Fault Tolerance with Kafka
Edureka!
 
PDF
Deep dive into Apache Kafka consumption
Alexandre Tamborrino
 
PDF
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
HostedbyConfluent
 
PDF
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
PPTX
Apache Kafka: Next Generation Distributed Messaging System
Edureka!
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PDF
Event Driven Architectures with Apache Kafka on Heroku
Heroku
 
PPTX
Webinar patterns anti patterns
confluent
 
PDF
Confluent Partner Tech Talk with Synthesis
confluent
 
PDF
Resilient Event Driven Systems With Kafka
Iccha Sethi
 
PDF
Multitenancy: Kafka clusters for everyone at LINE
kawamuray
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PPTX
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
PPTX
Paris Kafka Meetup - patterns anti-patterns
Florent Ramiere
 
PDF
Building High-Throughput, Low-Latency Pipelines in Kafka
confluent
 
PDF
Apache Kafka – (Pattern and) Anti-Pattern
confluent
 
PPTX
messaging.pptx
NParakh1
 
PDF
Build real time stream processing applications using Apache Kafka
Hotstar
 
PPTX
kafka_session_updated.pptx
Koiuyt1
 
Fault Tolerance with Kafka
Edureka!
 
Deep dive into Apache Kafka consumption
Alexandre Tamborrino
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
HostedbyConfluent
 
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
Apache Kafka: Next Generation Distributed Messaging System
Edureka!
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Event Driven Architectures with Apache Kafka on Heroku
Heroku
 
Webinar patterns anti patterns
confluent
 
Confluent Partner Tech Talk with Synthesis
confluent
 
Resilient Event Driven Systems With Kafka
Iccha Sethi
 
Multitenancy: Kafka clusters for everyone at LINE
kawamuray
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
Paris Kafka Meetup - patterns anti-patterns
Florent Ramiere
 
Building High-Throughput, Low-Latency Pipelines in Kafka
confluent
 
Apache Kafka – (Pattern and) Anti-Pattern
confluent
 
messaging.pptx
NParakh1
 
Build real time stream processing applications using Apache Kafka
Hotstar
 
kafka_session_updated.pptx
Koiuyt1
 
Ad

Recently uploaded (20)

PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 

Apache Kafka lessons learned @PAYBACK

  • 1. Apache Kafka lessons learned @PAYBACK Munich, 2017
  • 3. 3 PAYBACK Global – One Global Platform for 3 markets… Apache Kafka Lessons Learned @ PAYBACK Monolithic CORE 3 tier JEE + Configuration > 100 Million Customers … 14 Tage 24 h Produkt- Backlog Sprint- Backlog Sprint Runnable Software deploy SIT UAT Partner test Staging / NFR Test Transition Go-Live Monitoring > 30 environments > 200 server > 100 artefacts Monthly Major Release
  • 4. 4 Architecture Blueprint Apache Kafka Lessons Learned @ PAYBACK
  • 5. 5 Our Use Case sorry, it's not big data (yet) Apache Kafka Lessons Learned @ PAYBACK
  • 6. 6 Orchestration vs. Choreography – Business Process Apache Kafka Lessons Learned @ PAYBACK Sam Newman 2015, Building Microservicess, O'Reilly
  • 7. 7 Orchestration: Synchronous Apache Kafka Lessons Learned @ PAYBACK Sam Newman 2015, Building Microservicess, O'Reilly • Easy to map code to business process • Immediate Feedback about every stage • Atomic Execution • Customer Service becomes central place of logic • Leads to "God" Services • Tight coupling, high cost of changes • Resilience is complex (think retries, scaling…) PRO CON
  • 8. 8 Choreography: Asynchronous, Event-driven Apache Kafka Lessons Learned @ PAYBACK *Sam Newman 2015, Building Microservicess, O'Reilly • Easier to achieve Resilience and Performance • More decoupled • distributed logic • Higher flexibility (changes, scaling) • Higher implementation effort & complexity • Additional work for monitoring and tracking • Additional SPOF PRO CON
  • 9. 9 Resilience concerns the whole system Lose coupling helps implement resilience patterns, but you need to care about: ○ delivery and processing semantics ○ retries and fallback strategy ○ handle timeouts and other communication errors ○ transaction handling ○ no silver bullet pattern for all event types Apache Kafka Lessons Learned @ PAYBACK Resilience is about an ability to fully recover from failure - to self-heal
  • 10. 10 Choosing the right tool ○ NFRs may be specific to the Event type ○ Delivery semantic depend on Event type - at most once - at least once - exactly once ○ Events order for some use cases can be important (FIFO) ○ Reprocessing must be possible ○ Monitoring and alerting must be well supported (APIs) due to the increased complexity ○ … Apache Kafka Lessons Learned @ PAYBACK We need to consider
  • 11. 11 Apache Kafka Lessons Learned @ PAYBACK pub&sub, high throughput, low latency, scalable, centralized, real-time
  • 12. 12 I have a joke about an event… Apache Kafka Lessons Learned @ PAYBACK …But you might not get it INCIDENTS
  • 13. 13 Cluster outage Apache Kafka Lessons Learned @ PAYBACK -VMs stalled during snapshot backups leading to Cluster reconnects -in 9/10 cases recovery worked -in 1/10 cases this lead to a single broker outside the cluster which still had partitions assigned (luckily refused writes because of missing replicas) Deactivate Backups! Consider physical machines!
  • 14. 14 "A first sign of the beginning of understanding is the wish to die. " Franz Kafka Apache Kafka Lessons Learned @ PAYBACK Von Atelier Jacobi: Sigismund Jacobi (1860–1935) - http://www.bodleian.ox.ac.uk/news/2008_july_02, Gemeinfrei, https://commons.wikimedia.org/w/index.php?curid=5428566
  • 15. 15 Configuration and implementation is complex Apache Kafka Lessons Learned @ PAYBACK
  • 16. 16 Producer ○ Producer uses non-blocking async API ○ Tow options for checking for failures: - Immediately block for response: send().get() - Do followup work in Callback - Be careful about handling failures ○ Don’t forget to close the producer! producer.close() will block until in-flight transactions complete ○ acks – set to all ○ batch.size – set to 0 ○ retries (defaults to 0) - think about increasing this value - Not all errors are automatically retriable . Think about custom error handling on producer side! - retry may affect message ordering Apache Kafka Lessons Learned @ PAYBACK Implementation Configuration
  • 17. 17 Consumer o Note: Consumer is single threaded – one consumer per thread o disable auto commit (autocommit.enable = false) o commit using OffsetAndMetadata and not committing everything o rollback with seek -> you need to know your last committed message -> implement Rebalance Listener o rollback (seek) after errors in offset commit o change default max.partition.fetch.bytes (1MB can lead to session timeout in < 0.10.X) o event processing should be idempotent – be prepared to handle duplicates o think about event reprocessing (how to change offset, how to recreate event etc) Apache Kafka Lessons Learned @ PAYBACK Recommendations
  • 18. 18 Other basic configuration o Acks = all o Block.on.buffer.full = true o Producer Retries = MAX_INT o ( Max.inflight.requests.per.connect = 1 ) o Producer.close() o Replication-factor >= 3 o Min.insync.replicas = 2 o Unclean.leader.election = false o Auto.offset.commit = false o Commit after processing o Monitor! Apache Kafka Lessons Learned @ PAYBACK Be Safe, Not Sorry
  • 19. 19 Monitoring Apache Kafka Lessons Learned @ PAYBACK http://www.spiegel.de/spiegel/print/d-129456859.html
  • 20. 20 KafkaBrokerKafkaBroker KafkaBroker Timeseries Metrics to Graphite Apache Kafka Lessons Learned @ PAYBACK Metrics Library + Graphite Reporter Graphite Grafana KafkaConsumer Metrics Library + Graphite Reporter
  • 21. 21 Kafka-Manager: Open Source UI/API Kafka Mgmt Tool Apache Kafka Lessons Learned @ PAYBACK https://github.com/yahoo/kafka-manager • Good for current cluster status and ad-hoc analysis • Provides a status API (HTTP) • Consumers only displayed during active consumption • 0.10.x support still not merged
  • 22. 22 Kafka-Manager API Example Apache Kafka Lessons Learned @ PAYBACK curl –XGET http://kafka-manager/api/status/VP2/mdeAppGroup/groupSummary?consumerType=KF {memberDataChanges: {totalLag: 142, percentageCovered: 100, partitionOffsets: [1779279, 372957, 368100, 372415, 368349, 374649, 373262, 373934, 1775065, 373339, 369416, 374362, […]
  • 23. 23 Burrow: API only Consumer Lag Checking Apache Kafka Lessons Learned @ PAYBACK { error: false, message: "consumer group status returned", status: { cluster: "vp2", group: "mdeAppGroup", status: "ERR", complete: false, partitions: [ { topic: "memberDataChanges", partition: 1, status: "STOP", start: { offset: 1775109, timestamp: 1485253978439, lag: 0 }, end: { offset: 1775127, timestamp: 1485254054861, lag: 1 } },}, […] totallag: 8 }, request: {url: "/v2/kafka/vp2/consumer/mdeAppGroup/lag", host: "hqiqlpxxap89", cluster: "vp2", group: "mdeAppGroup", topic: "" } } curl –XGET http://burrow/v2/kafka/vp2/consumer/mdeAppGroup/lag No Thresholds required Alerting via email and HTTP POST Issue: Calculate lag at request time, not commit time
  • 24. 24 "God gives the nuts, but he does not crack them." Franz Kafka PAYBACK GmbH Maxim Schelest Thomas Falkenberg Theresienhöhe 12 80339 München Phone +49 (0) 89 997 41 – 0 PAYBACK.net | PAYBACK.de