SlideShare a Scribd company logo
© 2017 IBM Corporation
Edoardo Comar ecomar@uk.ibm.com
Andrew Schofield andrew_schofield@uk.ibm.com
IBM Message Hub
Kafka and the Polyglot Programmer
A brief overview of the Kafka clients ecosystem
© 2017 IBM Corporation
Session objectives
• Show some comparable Kafka usage from different languages
• Show a little of what goes underneath a Kafka client
• To appreciate the heavy lifting a client has to do
• We’ll proceed in reverse order though J
© 2017 IBM Corporation
How does an application talk to Kafka ?
• Protocol-level client libraries (implementing the Kafka “wire” API)
• The “official” Java client
• librdkafka C/C++ library and wrappers for other languages
• Other clients from a large open-source ecosystem
• Alternative “message-level” APIs
• Kafkacat, REST
• Higher-level APIs
• Kafka Connect, Kafka Streams
© 2017 IBM Corporation
What is the Kafka protocol (the ‘wire’ API) ?
• A set of Request/Response message pairs
• e.g.: ProduceRequest, ProduceResponse (ApiKey=0)
• A set of error codes
• e.g.: Unknown Topic Or Partition (code=3)
• Messages exchanged using Kafka’s own binary protocol
• Over TCP (or TLS)
• It’s not HTTP, AMQP, MQTT …
• All requests initiated by the clients.
• Brokers send Responses
© 2017 IBM Corporation
Kafka’s TCP binary protocol
• Open-source protocol (obviously!)
• Messages defined in terms of Serializable data structures
• Primitive types (intNN, nullable string) + Arrays
• Struct types, e.g.
RecordBatch for sequence of Records (key, value, metadata)
• Clients typically holds multiple long-lived TCP connections
• One per broker node
• Clients expected to use non-blocking I/O
http://kafka.apache.org/protocol
© 2017 IBM Corporation
Kafka message capture with Wireshark
$ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
© 2017 IBM Corporation
Anatomy of a wire message
Magic 1
(v.0.10.x)
© 2017 IBM Corporation
In 0.11 RecordBatch superseded MessageSet
• Magic value = 2
• Records have Headers (KIP-82)
• They look like footers 😀
• Metadata for Exactly-Once Semantics
• Space savings for large batches
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+
Kafka+Protocol#AGuideToTheKafkaProtocol-Messagesets
© 2017 IBM Corporation
It’s an open Source protocol …
… so anyone can write a client, in any language ?
• In theory, yes
• In practice, it’s a very big investment
• A lot of intelligence goes in the client
• Partitioning
• Consumer Group assignment
• Complexity has grown a lot since 0.8 …
• Consumer group protocol
• Security protocols/SASL mechanisms
• KIP-4 (administrative actions)
• Exactly-Once Semantics
© 2017 IBM Corporation
The evolution of the Kafka API
Kafka
Version
released # of API Keys (RPCs) # of Error Codes
Including -1 UNKNOWN
0 NO_ERROR
0.7.2 Nov 2012 5 6
0.8.0 Nov 2013 8 13
0.9.0 Nov 2015 17 33
0.10.0 May 2016 19 37
0.10.2 Feb 2017 21 46
0.11.0 June 2017 33 55
• Brokers support older clients
• Recent clients support somewhat older brokers
© 2017 IBM Corporation
• Good support for the features of Apache Kafka
• Message keys, committing offsets, exactly-once semantics, ...
• Blending natural idioms of the language with proper use of Kafka
• Solid software engineering
• Responsive community support
• Native code or ‘pure’
• Particularly important in the cloud
• Does it support the technologies you have chosen to use?
• Message encoding, Schema Registry, ...
What makes for a good choice of client?
© 2017 IBM Corporation
Project Language Pure or native code
Apache Kafka client Java pure
librdkafka C / C++ –
node-kafka-native JavaScript (Node.js) native
node-rdkafka JavaScript (Node.js) native
confluent-kafka-go Go native
Sarama Go pure
kafkacat CLI / Shell scripts –
Confluent Kafka REST Any –
Let’s take a look at some different clients
© 2017 IBM Corporation
Java
producer
• Part of Apache Kafka
• Best for feature support and
performance
• Asynchronous with batching
• Highly configurable
• Rich metrics
https://kafka.apache.org/0110/javadoc/index.html
© 2017 IBM Corporation
• Part of Apache Kafka
• Best for feature support and
performance
• Single-threaded
• Polls for records and this is
also a liveness check
• Commits offsets
automatically, async or sync
Java
consumer
https://kafka.apache.org/0110/javadoc/index.html
© 2017 IBM Corporation
C / C++ librdkafka
• Fully featured native code Kafka client library
• Portable so supports Linux, MacOS, Windows and more
• Used as the basis for many other client libraries for other languages
• Does a good job of keeping track with the Kafka releases
• A bit tricky to build on platforms other than Linux if you want security
• SASL only recently supported on Windows
• SSL on Mac requires homebrew
• Can emit metrics
• At broker and topic-partition levels
https://github.com/edenhill/librdkafka
© 2017 IBM Corporation
• Concepts very similar to
Apache Kafka client
• But you have to manage
memory yourself
• Uses callbacks to report
status but you have to poll to
have them fire
C librdkafka
producer
© 2017 IBM Corporation
• This is the low-level consumer
interface
• The high-level one supports
consumer groups
• Thread-safe (unlike Java)
C librdkafka
consumer
© 2017 IBM Corporation
• Built on top of the C library
• Looks more similar to Java,
primarily because it’s object-
oriented
• Again, there’s a need to make
a regular call to respond to
callbacks
C++ librdkafka
consumer
© 2017 IBM Corporation
• Another Node.js module
wrapping librdkafka
• Looked promising but
ultimately not updated to
keep up with new features
• No updates for a long time
now
• Use node-rdkafka instead
https://github.com/alfred-landrum/node-kafka-native
JS node-kafka-native
© 2017 IBM Corporation
• Third-party Node.js module
wrapping librdkafka
• Natural Node.js style of event
delivery
• A good example of the
community working well
https://github.com/Blizzard/node-rdkafka
JS node-rdkafka
producer
© 2017 IBM Corporation
• Supports many of the
features of consuming such
as rebalancing, committing
offsets
• There’s also a streaming
interface
https://github.com/Blizzard/node-rdkafka
JS node-rdkafka
consumer
© 2017 IBM Corporation
confluent-kafka-go
producer
• Confluent Go client based on
librdkafka
• Two variants of producer
• Function-based
• Channel-based
• Delivery reports emitted on
Events channel
© 2017 IBM Corporation
• This variant of the API uses
polling and then a type switch
confluent-kafka-go
consumer
© 2017 IBM Corporation
• This variant of the API uses a
channel to deliver messages
and events such as rebalance
confluent-kafka-go
consumer
© 2017 IBM Corporation
• Third-party pure Go client
• Currently at 0.10.2.x level
Go Sarama
producer
https://shopify.github.io/sarama/
© 2017 IBM Corporation
• Consumer groups not
supported yet
• No offset tracking
• Available as 3rd party
extensions
Go Sarama
consumer
https://shopify.github.io/sarama/
© 2017 IBM Corporation
kafkacat
https://github.com/edenhill/kafkacat
• Command line
non-JVM
Kafka producer and
consumer
• Unsurprisingly, uses
librdkafka too
• Useful in shell scripts
and just for trying stuff
out on the command
line
© 2017 IBM Corporation
• Part of Confluent
Platform
• Integrated with
Schema Registry
• Use any language…
• A bit tricky to format
the data correctly
Confluent
Kafka REST
https://github.com/confluentinc/kafka-rest
© 2017 IBM Corporation
Do non-Java users face many issues?
• Most problems are conceptual
• Many new users struggle with the concepts of Kafka L
• Users assume it’s the same as traditional messaging systems
• Partitions, consumer groups, at-least-once semantics, ...
• Documentation nowadays is getting really good
• Historically, lack of best-practice examples in the various languages
• Handling expected errors properly is a common theme
• Failed commits, producer timeouts, ...
• Non-Java clients lag behind Java in features
• librdkafka doing a great job here, but dependent clients need to expose the
features
• Even more true for independent clients
© 2017 IBM Corporation
Summary
• Kafka has mature clients for several popular languages
• Java still gives the best experience
• librdkafka is delivering a solid base for non-Java clients
• At the expense of native code
• Some third-party ‘pure’ clients look good too
• But the community support needs to stay the course
© 2017 IBM Corporation
A few links
Kafka Protocol
http://kafka.apache.org/protocol
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
Kafka Clients directory
https://cwiki.apache.org/confluence/display/KAFKA/Clients
Code samples / modules
https://github.com/ibm-messaging/message-hub-samples
https://www.npmjs.com/package/message-hub-rest
http://docs.confluent.io/current/clients/index.html
© 2017 IBM Corporation
Q & A
Contact us through the Summit App or via email
ecomar@uk.ibm.com
andrew_schofield@uk.ibm.com
Thanks !

More Related Content

What's hot (20)

PPTX
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
PDF
What's new in Confluent 3.2 and Apache Kafka 0.10.2
confluent
 
PDF
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
PPTX
Introduction to Kafka
Akash Vacher
 
KEY
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Erik Onnen
 
PPTX
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
PPTX
Real time Messages at Scale with Apache Kafka and Couchbase
Will Gardella
 
PPTX
Kafka blr-meetup-presentation - Kafka internals
Ayyappadas Ravindran (Appu)
 
PPTX
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
HostedbyConfluent
 
PDF
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
HostedbyConfluent
 
PDF
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 
PPTX
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
HostedbyConfluent
 
PDF
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
confluent
 
PPTX
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
PDF
Introduction to Apache Kafka
Shiao-An Yuan
 
PDF
Kafka and Spark Streaming
datamantra
 
PPTX
kafka for db as postgres
PivotalOpenSourceHub
 
PDF
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
HostedbyConfluent
 
PPTX
Kafka Tutorial - basics of the Kafka streaming platform
Jean-Paul Azar
 
PPTX
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
confluent
 
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
What's new in Confluent 3.2 and Apache Kafka 0.10.2
confluent
 
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
Introduction to Kafka
Akash Vacher
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Erik Onnen
 
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
Real time Messages at Scale with Apache Kafka and Couchbase
Will Gardella
 
Kafka blr-meetup-presentation - Kafka internals
Ayyappadas Ravindran (Appu)
 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
HostedbyConfluent
 
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
HostedbyConfluent
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 
A Modern C++ Kafka API | Kenneth Jia, Morgan Stanley
HostedbyConfluent
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
confluent
 
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
Introduction to Apache Kafka
Shiao-An Yuan
 
Kafka and Spark Streaming
datamantra
 
kafka for db as postgres
PivotalOpenSourceHub
 
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
HostedbyConfluent
 
Kafka Tutorial - basics of the Kafka streaming platform
Jean-Paul Azar
 
Protecting your data at rest with Apache Kafka by Confluent and Vormetric
confluent
 

Similar to Kafka Summit SF 2017 - Kafka and the Polyglot Programmer (20)

PPTX
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
PPTX
Kafka from Go
Fco. Javier Sanz Olivera
 
PPTX
Building an Event Bus at Scale
jimriecken
 
PPTX
Kafka and ibm event streams basics
Brian S. Paskin
 
PDF
A la rencontre de Kafka, le log distribué par Florian GARCIA
La Cuisine du Web
 
PPTX
Introduction Apache Kafka
Joe Stein
 
PPTX
Streaming the platform with Confluent (Apache Kafka)
GiuseppeBaccini
 
PPTX
Real-time streaming and data pipelines with Apache Kafka
Joe Stein
 
PDF
An Introduction to Apache Kafka
Amir Sedighi
 
PPTX
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
PPTX
Developing with the Go client for Apache Kafka
Joe Stein
 
PPTX
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
Lucas Jellema
 
PPTX
Apache Kafka
Joe Stein
 
PDF
Apache Kafka - Strakin Technologies Pvt Ltd
Strakin Technologies Pvt Ltd
 
PDF
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
confluent
 
PPTX
Kafka y python
Paradigma Digital
 
PDF
Building API data products on top of your real-time data infrastructure
confluent
 
PDF
Apache Kafka
Worapol Alex Pongpech, PhD
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
Building an Event Bus at Scale
jimriecken
 
Kafka and ibm event streams basics
Brian S. Paskin
 
A la rencontre de Kafka, le log distribué par Florian GARCIA
La Cuisine du Web
 
Introduction Apache Kafka
Joe Stein
 
Streaming the platform with Confluent (Apache Kafka)
GiuseppeBaccini
 
Real-time streaming and data pipelines with Apache Kafka
Joe Stein
 
An Introduction to Apache Kafka
Amir Sedighi
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Developing with the Go client for Apache Kafka
Joe Stein
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
Lucas Jellema
 
Apache Kafka
Joe Stein
 
Apache Kafka - Strakin Technologies Pvt Ltd
Strakin Technologies Pvt Ltd
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
confluent
 
Kafka y python
Paradigma Digital
 
Building API data products on top of your real-time data infrastructure
confluent
 
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
PDF
Migration, backup and restore made easy using Kannika
confluent
 
PDF
Five Things You Need to Know About Data Streaming in 2025
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
PDF
Unlocking value with event-driven architecture by Confluent
confluent
 
PDF
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
PDF
Speed Wins: From Kafka to APIs in Minutes
confluent
 
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
PDF
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
confluent
 
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
confluent
 
Ad

Recently uploaded (20)

PDF
6th International Conference on Machine Learning Techniques and Data Science ...
ijistjournal
 
PDF
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
PDF
Additional Information in midterm CPE024 (1).pdf
abolisojoy
 
PPTX
UNIT DAA PPT cover all topics 2021 regulation
archu26
 
PDF
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
PPTX
drones for disaster prevention response.pptx
NawrasShatnawi1
 
PDF
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PPTX
site survey architecture student B.arch.
sri02032006
 
PPTX
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
PPTX
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
PDF
ARC--BUILDING-UTILITIES-2-PART-2 (1).pdf
IzzyBaniquedBusto
 
PPTX
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
PPTX
Green Building & Energy Conservation ppt
Sagar Sarangi
 
PPT
inherently safer design for engineering.ppt
DhavalShah616893
 
PPTX
Thermal runway and thermal stability.pptx
godow93766
 
PDF
MOBILE AND WEB BASED REMOTE BUSINESS MONITORING SYSTEM
ijait
 
PPTX
Types of Bearing_Specifications_PPT.pptx
PranjulAgrahariAkash
 
DOCX
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
PPTX
Pharmaceuticals and fine chemicals.pptxx
jaypa242004
 
6th International Conference on Machine Learning Techniques and Data Science ...
ijistjournal
 
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
Additional Information in midterm CPE024 (1).pdf
abolisojoy
 
UNIT DAA PPT cover all topics 2021 regulation
archu26
 
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
drones for disaster prevention response.pptx
NawrasShatnawi1
 
Unified_Cloud_Comm_Presentation anil singh ppt
anilsingh298751
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
site survey architecture student B.arch.
sri02032006
 
MPMC_Module-2 xxxxxxxxxxxxxxxxxxxxx.pptx
ShivanshVaidya5
 
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
ARC--BUILDING-UTILITIES-2-PART-2 (1).pdf
IzzyBaniquedBusto
 
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
Green Building & Energy Conservation ppt
Sagar Sarangi
 
inherently safer design for engineering.ppt
DhavalShah616893
 
Thermal runway and thermal stability.pptx
godow93766
 
MOBILE AND WEB BASED REMOTE BUSINESS MONITORING SYSTEM
ijait
 
Types of Bearing_Specifications_PPT.pptx
PranjulAgrahariAkash
 
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
Pharmaceuticals and fine chemicals.pptxx
jaypa242004
 

Kafka Summit SF 2017 - Kafka and the Polyglot Programmer

  • 1. © 2017 IBM Corporation Edoardo Comar [email protected] Andrew Schofield [email protected] IBM Message Hub Kafka and the Polyglot Programmer A brief overview of the Kafka clients ecosystem
  • 2. © 2017 IBM Corporation Session objectives • Show some comparable Kafka usage from different languages • Show a little of what goes underneath a Kafka client • To appreciate the heavy lifting a client has to do • We’ll proceed in reverse order though J
  • 3. © 2017 IBM Corporation How does an application talk to Kafka ? • Protocol-level client libraries (implementing the Kafka “wire” API) • The “official” Java client • librdkafka C/C++ library and wrappers for other languages • Other clients from a large open-source ecosystem • Alternative “message-level” APIs • Kafkacat, REST • Higher-level APIs • Kafka Connect, Kafka Streams
  • 4. © 2017 IBM Corporation What is the Kafka protocol (the ‘wire’ API) ? • A set of Request/Response message pairs • e.g.: ProduceRequest, ProduceResponse (ApiKey=0) • A set of error codes • e.g.: Unknown Topic Or Partition (code=3) • Messages exchanged using Kafka’s own binary protocol • Over TCP (or TLS) • It’s not HTTP, AMQP, MQTT … • All requests initiated by the clients. • Brokers send Responses
  • 5. © 2017 IBM Corporation Kafka’s TCP binary protocol • Open-source protocol (obviously!) • Messages defined in terms of Serializable data structures • Primitive types (intNN, nullable string) + Arrays • Struct types, e.g. RecordBatch for sequence of Records (key, value, metadata) • Clients typically holds multiple long-lived TCP connections • One per broker node • Clients expected to use non-blocking I/O http://kafka.apache.org/protocol
  • 6. © 2017 IBM Corporation Kafka message capture with Wireshark $ bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
  • 7. © 2017 IBM Corporation Anatomy of a wire message Magic 1 (v.0.10.x)
  • 8. © 2017 IBM Corporation In 0.11 RecordBatch superseded MessageSet • Magic value = 2 • Records have Headers (KIP-82) • They look like footers 😀 • Metadata for Exactly-Once Semantics • Space savings for large batches https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+ Kafka+Protocol#AGuideToTheKafkaProtocol-Messagesets
  • 9. © 2017 IBM Corporation It’s an open Source protocol … … so anyone can write a client, in any language ? • In theory, yes • In practice, it’s a very big investment • A lot of intelligence goes in the client • Partitioning • Consumer Group assignment • Complexity has grown a lot since 0.8 … • Consumer group protocol • Security protocols/SASL mechanisms • KIP-4 (administrative actions) • Exactly-Once Semantics
  • 10. © 2017 IBM Corporation The evolution of the Kafka API Kafka Version released # of API Keys (RPCs) # of Error Codes Including -1 UNKNOWN 0 NO_ERROR 0.7.2 Nov 2012 5 6 0.8.0 Nov 2013 8 13 0.9.0 Nov 2015 17 33 0.10.0 May 2016 19 37 0.10.2 Feb 2017 21 46 0.11.0 June 2017 33 55 • Brokers support older clients • Recent clients support somewhat older brokers
  • 11. © 2017 IBM Corporation • Good support for the features of Apache Kafka • Message keys, committing offsets, exactly-once semantics, ... • Blending natural idioms of the language with proper use of Kafka • Solid software engineering • Responsive community support • Native code or ‘pure’ • Particularly important in the cloud • Does it support the technologies you have chosen to use? • Message encoding, Schema Registry, ... What makes for a good choice of client?
  • 12. © 2017 IBM Corporation Project Language Pure or native code Apache Kafka client Java pure librdkafka C / C++ – node-kafka-native JavaScript (Node.js) native node-rdkafka JavaScript (Node.js) native confluent-kafka-go Go native Sarama Go pure kafkacat CLI / Shell scripts – Confluent Kafka REST Any – Let’s take a look at some different clients
  • 13. © 2017 IBM Corporation Java producer • Part of Apache Kafka • Best for feature support and performance • Asynchronous with batching • Highly configurable • Rich metrics https://kafka.apache.org/0110/javadoc/index.html
  • 14. © 2017 IBM Corporation • Part of Apache Kafka • Best for feature support and performance • Single-threaded • Polls for records and this is also a liveness check • Commits offsets automatically, async or sync Java consumer https://kafka.apache.org/0110/javadoc/index.html
  • 15. © 2017 IBM Corporation C / C++ librdkafka • Fully featured native code Kafka client library • Portable so supports Linux, MacOS, Windows and more • Used as the basis for many other client libraries for other languages • Does a good job of keeping track with the Kafka releases • A bit tricky to build on platforms other than Linux if you want security • SASL only recently supported on Windows • SSL on Mac requires homebrew • Can emit metrics • At broker and topic-partition levels https://github.com/edenhill/librdkafka
  • 16. © 2017 IBM Corporation • Concepts very similar to Apache Kafka client • But you have to manage memory yourself • Uses callbacks to report status but you have to poll to have them fire C librdkafka producer
  • 17. © 2017 IBM Corporation • This is the low-level consumer interface • The high-level one supports consumer groups • Thread-safe (unlike Java) C librdkafka consumer
  • 18. © 2017 IBM Corporation • Built on top of the C library • Looks more similar to Java, primarily because it’s object- oriented • Again, there’s a need to make a regular call to respond to callbacks C++ librdkafka consumer
  • 19. © 2017 IBM Corporation • Another Node.js module wrapping librdkafka • Looked promising but ultimately not updated to keep up with new features • No updates for a long time now • Use node-rdkafka instead https://github.com/alfred-landrum/node-kafka-native JS node-kafka-native
  • 20. © 2017 IBM Corporation • Third-party Node.js module wrapping librdkafka • Natural Node.js style of event delivery • A good example of the community working well https://github.com/Blizzard/node-rdkafka JS node-rdkafka producer
  • 21. © 2017 IBM Corporation • Supports many of the features of consuming such as rebalancing, committing offsets • There’s also a streaming interface https://github.com/Blizzard/node-rdkafka JS node-rdkafka consumer
  • 22. © 2017 IBM Corporation confluent-kafka-go producer • Confluent Go client based on librdkafka • Two variants of producer • Function-based • Channel-based • Delivery reports emitted on Events channel
  • 23. © 2017 IBM Corporation • This variant of the API uses polling and then a type switch confluent-kafka-go consumer
  • 24. © 2017 IBM Corporation • This variant of the API uses a channel to deliver messages and events such as rebalance confluent-kafka-go consumer
  • 25. © 2017 IBM Corporation • Third-party pure Go client • Currently at 0.10.2.x level Go Sarama producer https://shopify.github.io/sarama/
  • 26. © 2017 IBM Corporation • Consumer groups not supported yet • No offset tracking • Available as 3rd party extensions Go Sarama consumer https://shopify.github.io/sarama/
  • 27. © 2017 IBM Corporation kafkacat https://github.com/edenhill/kafkacat • Command line non-JVM Kafka producer and consumer • Unsurprisingly, uses librdkafka too • Useful in shell scripts and just for trying stuff out on the command line
  • 28. © 2017 IBM Corporation • Part of Confluent Platform • Integrated with Schema Registry • Use any language… • A bit tricky to format the data correctly Confluent Kafka REST https://github.com/confluentinc/kafka-rest
  • 29. © 2017 IBM Corporation Do non-Java users face many issues? • Most problems are conceptual • Many new users struggle with the concepts of Kafka L • Users assume it’s the same as traditional messaging systems • Partitions, consumer groups, at-least-once semantics, ... • Documentation nowadays is getting really good • Historically, lack of best-practice examples in the various languages • Handling expected errors properly is a common theme • Failed commits, producer timeouts, ... • Non-Java clients lag behind Java in features • librdkafka doing a great job here, but dependent clients need to expose the features • Even more true for independent clients
  • 30. © 2017 IBM Corporation Summary • Kafka has mature clients for several popular languages • Java still gives the best experience • librdkafka is delivering a solid base for non-Java clients • At the expense of native code • Some third-party ‘pure’ clients look good too • But the community support needs to stay the course
  • 31. © 2017 IBM Corporation A few links Kafka Protocol http://kafka.apache.org/protocol https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol Kafka Clients directory https://cwiki.apache.org/confluence/display/KAFKA/Clients Code samples / modules https://github.com/ibm-messaging/message-hub-samples https://www.npmjs.com/package/message-hub-rest http://docs.confluent.io/current/clients/index.html
  • 32. © 2017 IBM Corporation Q & A Contact us through the Summit App or via email [email protected] [email protected] Thanks !