SlideShare a Scribd company logo
A Lunchtime Introduction to Kafka
Wrote this….
Works with this amazing bunch.
The Rules Haven’t Changed……
You may heckle via Twitter…..
@jasonbelldata
It’s Friday, you might as well.
It’s in print, no one can argue…..
What is Kafka?
It’s an immutable log.
Virtual Bash! A Lunchtime Introduction to Kafka
Think of it like this…..
My Kafka Log...
Opinionated thought
approaching.
#grumpy_yorkshireman
IT
IS
NOT
A
DATABASE
Kafka is an event processing platform which is based on a
distributed architecture.
Kafka is an event processing platform which is based on a
distributed architecture.
To producers and consumer subscribed to the system it
would appear as a standalone processing engine but
production systems are built on many machines.
Kafka is an event processing platform which is based on a
distributed architecture.
To producers and consumer subscribed to the system it
would appear as a standalone processing engine but
production systems are built on many machines.
It can handle millions of messages throughput, dependent on
physical disk and RAM on the machines, and is fault tolerant.
Virtual Bash! A Lunchtime Introduction to Kafka
Messages are sent to topics which are written sequentially in
an immutable log. Kafka can support many topics and can be
replicated and partitioned.
Once the records are appended to the topic log
they can’t be deleted or amended.
(If the customer wants you to
“replay from the start”, well they can’t)
It’s a very simple data structure
where each message is byte encoded.
Each Message has:
A key
A value (payload)
A timestamp
A header (for meta data)
Producers and consumers can serialise and deserialise data
to various formats.
(I’ll talk about that later)
The Kafka Ecosystem
Image source: Confluent Inc.
The Kafka Cluster
Brokers
Virtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to Kafka
Partitions
Virtual Bash! A Lunchtime Introduction to Kafka
Virtual Bash! A Lunchtime Introduction to Kafka
Topics
Creating a Topic
bin/kafka-topics --create -–zookeeper localhost:2181 --replication-factor 4 
--partitions 10 --topic testtopic --config min.insync.replicas=2
Listing Topics
bin/kafka-topics --zookeeper localhost:2181 --list
Deleting a Topic
$ bin/kafka-topics --zookeeper localhost:2181 --delete --topic
testtopic
Topic testtopic is marked for deletion.
Note: This will have no impact if delete.topic.enable is not set
to true.
Topic Rentention
Rentention by Time
log.retention.hours
log.retention.minutes
log.retention.ms
The default is 168 hours (7 days) if not set.
Note: The smallest value setting take priority, be careful!
Rentention by Size
log.retention.bytes
Defines volume of log bytes retained. Applied per partition, so if set to 1GB
and you have 8 partitions, you are storing 8GB.
Client Libraries
C/C++ github.com/edenhill/librdkafka
Go github.com/confluentinc/confluent-kafka-go
Java Kafka Consumer and Kafka Producer
JMS JMS Client
.NET github.com/confluentinc/confluent-kafka-dotnet
Python github.com/confluentinc/confluent-kafka-python
Client Library Language Support
Client Libraries: Java and Go.
Java: Full support for all API features.
Go: Exactly once semantics
Kafka Streams
Schema Registry
Simplified Installation
Are NOT supported.
Producers
Producers
(Sends messages to the
Kafka Cluster)
import java.util.Properties;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
public class SimpleProducer {
public static void main(String[] args) throws Exception{
String topicName = “testtopic”;
Properties props = new Properties();
props.put("bootstrap.servers", “localhost:9092");
props.put("acks", “all");
props.put("retries", 0);
props.put("batch.size", 10);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer
<String, String>(props);
for(int i = 0; i < 10; i++)
producer.send(new ProducerRecord<String, String>(topicName,
Integer.toString(i), “This is my message: “ + Integer.toString(i)));
System.out.println(“Message sent successfully”);
producer.close();
}
}
import java.util.Properties;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
public class SimpleProducer {
public static void main(String[] args) throws Exception{
String topicName = “testtopic”;
Properties props = new Properties();
props.put("bootstrap.servers", “localhost:9092");
props.put("acks", “all");
props.put("retries", 0);
props.put("batch.size", 10);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer
<String, String>(props);
for(int i = 0; i < 10; i++)
producer.send(new ProducerRecord<String, String>(topicName,
Integer.toString(i), “This is my message: “ + Integer.toString(i)));
System.out.println(“Message sent successfully”);
producer.close();
}
}
props.put("acks", “all");
Value Action
-1/all Message has been received by the leader and followers.
0 Fire and Forget
1 Once the message is received by the leader.
Virtual Bash! A Lunchtime Introduction to Kafka
public class ProducerWithCallback implements ProducerInterceptor{
private int onSendCount;
private int onAckCount;
private final Logger logger = LoggerFactory.getLogger(ProducerWithCallback.class);
@Override
public ProducerRecord onSend(final ProducerRecord record) {
onSendCount++;
System.out.println(String.format("onSend topic=%s key=%s value=%s %d n",
record.topic(), record.key(), record.value().toString(),
record.partition()));
return record;
}
@Override
public void onAcknowledgement(final RecordMetadata metadata, final Exception exception) {
onAckCount++;
System.out.println(String.format("onAck topic=%s, part=%d, offset=%dn",
metadata.topic(), metadata.partition(), metadata.offset()));
}
@Override
public void close() {
System.out.println("Total sent: " + onSendCount);
System.out.println("Total acks: " + onAckCount);
}
@Override
public void configure(Map<String,?> configs) {
}
}
import java.util.Properties;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
public class SimpleProducer {
public static void main(String[] args) throws Exception{
String topicName = “testtopic”;
Properties props = new Properties();
props.put("bootstrap.servers", “localhost:9092");
props.put("acks", “all");
props.put("retries", 0);
props.put("batch.size", 10);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer
<String, String>(props);
for(int i = 0; i < 10; i++)
producer.send(new ProducerRecord<String, String>(topicName,
Integer.toString(i), “This is my message: “ + Integer.toString(i)));
System.out.println(“Message sent successfully”);
producer.close();
}
}
//Only one in-flight messages per Kafka broker connection
// - max.in.flight.requests.per.connection (default 5)
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,1);
//Set the number of retries - retries
props.put(ProducerConfig.RETRIES_CONFIG, 3);
//Request timeout - request.timeout.ms
props.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, 15000);
//Only retry after one second.
props.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, 1000);
import java.util.Properties;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerRecord;
public class SimpleProducer {
public static void main(String[] args) throws Exception{
String topicName = “testtopic”;
Properties props = new Properties();
props.put("bootstrap.servers", “localhost:9092");
props.put("acks", “all");
props.put("retries", 0);
props.put("batch.size", 16384);
props.put("linger.ms", 1);
props.put("buffer.memory", 33554432);
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer");
Producer<String, String> producer = new KafkaProducer
<String, String>(props);
for(int i = 0; i < 10; i++)
producer.send(new ProducerRecord<String, String>(topicName,
Integer.toString(i), “This is my message: “ + Integer.toString(i)));
System.out.println(“Message sent successfully”);
producer.close();
}
}
byte[] Serdes.ByteArray(), Serdes.Bytes()
ByteBuffer Serdes.ByteBuffer()
Double Serdes.Double()
Integer Serdes.Integer()
Long Serdes.Long()
String Serdes.String()
JSON JsonPOJOSerializer()/Deserializer()
Serialize/Deserialize Types
Consumers
public class SimpleConsumer {
public static void main(String[] args) throws Exception {
if(args.length == 0){
System.out.println("Enter topic name");
return;
}
//Kafka consumer configuration settings
String topicName = args[0].toString();
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "1000");
props.put("session.timeout.ms", "30000");
props.put("key.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer
<String, String>(props);
//Kafka Consumer subscribes list of topics here.
consumer.subscribe(Arrays.asList(topicName))
//print the topic name
System.out.println("Subscribed to topic " + topicName);
int i = 0;
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records)
// print the offset,key and value for the consumer records.
System.out.printf("offset = %d, key = %s, value = %sn",
record.offset(), record.key(), record.value());
}
}
}
Consumer Groups
Virtual Bash! A Lunchtime Introduction to Kafka
Consumer Group Offsets
The offset is a simple integer number that is used
by Kafka to maintain the current position of a
consumer.
kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group mygroup
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER
mygroup t1 0 1 3 2 test-consumer
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER
mygroup t1 0 1 12 11 test-consumer
Streaming API
public class WordCountLambdaExample {
static final String inputTopic = "streams-plaintext-input";
static final String outputTopic = "streams-wordcount-output";
/**
* The Streams application as a whole can be launched like any normal Java application that has a `main()` method.
*/
public static void main(final String[] args) {
final String bootstrapServers = args.length > 0 ? args[0] : "localhost:9092";
// Configure the Streams application.
final Properties streamsConfiguration = getStreamsConfiguration(bootstrapServers);
// Define the processing topology of the Streams application.
final StreamsBuilder builder = new StreamsBuilder();
createWordCountStream(builder);
final KafkaStreams streams = new KafkaStreams(builder.build(), streamsConfiguration);
streams.cleanUp();
streams.start();
Runtime.getRuntime().addShutdownHook(new Thread(streams::close));
}
static Properties getStreamsConfiguration(final String bootstrapServers) {
final Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-lambda-example");
streamsConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, "wordcount-lambda-example-client");
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10 * 1000);
streamsConfiguration.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0);
streamsConfiguration.put(StreamsConfig.STATE_DIR_CONFIG, TestUtils.tempDirectory().getAbsolutePath());
return streamsConfiguration;
}
static void createWordCountStream(final StreamsBuilder builder) {
final KStream<String, String> textLines = builder.stream(inputTopic);
final Pattern pattern = Pattern.compile("W+", Pattern.UNICODE_CHARACTER_CLASS);
final KTable<String, Long> wordCounts = textLines
.flatMapValues(value -> Arrays.asList(pattern.split(value.toLowerCase())))
.groupBy((keyIgnored, word) -> word)
.count();
wordCounts.toStream().to(outputTopic, Produced.with(Serdes.String(), Serdes.Long()));
}
}
https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/
Supports exactly once transactions! Yay!
I know what you’re thinking…...
Virtual Bash! A Lunchtime Introduction to Kafka
processing.guarantee=exactly_once
What is Kafka Connect?
Provides a mechanism for
data sources and data
sinks.
Image source: Confluent Inc.
Data Source
Data from a thing (Database,
Twitter stream, Logstash etc)
going to Kafka.
Data Sink
Data going from Kafka to a
thing (Database, file,
ElasticSearch)
There are many
community connector
plugins that will fit most
needs.
A Source Example
{
"name": "twitter_source_json_01",
"config": {
"connector.class":
"com.github.jcustenborder.kafka.connect.twitter.TwitterSourceConnector",
"twitter.oauth.accessToken": "xxxxxxxxxxxxx",
"twitter.oauth.consumerSecret": "xxxxxxxxxxxx",
"twitter.oauth.consumerKey": "xxxxxxxxxxxx",
"twitter.oauth.accessTokenSecret": "xxxxxxxxxxx",
"kafka.delete.topic": "twitter_deletes_json_01",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": false,
"key.converter.schemas.enable": false,
"kafka.status.topic": "twitter_json_01",
"process.deletes": true,
"filter.keywords": "#instil, #virtualdevbash"
}
}
curl -XPOST -H ‘Content-type: application/json’ -d @myconnector.json 
http://localhost:8083/connectors
A Sink Example
{
"name": "string-jdbc-sink",
"config": {
"connector.class": “com.domain.test.StringJDBCSink”,
"connector.type": "sink",
"tasks.max": "1",
"topics": “topic_to_read”,
"topic.type": "avro",
"tablename":"testtopic",
"connection.user": "dbuser",
"connection.password": “dbpassword_encoded”,
"connection.url": "jdbc:mysql://localhost:3307/connecttest",
"query.string": "INSERT INTO testtopic (payload) VALUES(?);",
"db.driver": "com.mysql.jdbc.Driver",
"key.converter":"org.apache.kafka.connect.storage.StringConverter",
"value.converter":"io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "http://localhost:8081"
}
}
Dead Letter Queues
Sometimes stuff doesn’t
work……
Image source: Confluent Inc.
“errors.tolerance":"all",
"errors.log.enable":true,
"errors.log.include.messages":true
“errors.deadletterqueue.topic.name":"dlq_topic_name",
"errors.deadletterqueue.topic.replication.factor":1,
"errors.deadletterqueue.context.headers.enable":true,
"errors.retry.delay.max.ms": 60000,
"errors.retry.timeout": 300000
Monitoring
At present the main broker Kafka metrics (via JMX)
kafka.server:*
kafka.controller:*
kafka.coordinator.group:*
kafka.coordinator.transaction:*
kafka.log:*
kafka.network:*
kafka.server:*
kafka.utils:*
Topic Level Fetch Metrics
MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+),topic=([-.w]+)
Key Name Description
fetch-size-avg The average number of bytes fetched per
request for a specific topic.
fetch-size-max The maximum number of bytes fetched per
request for a specific topic.
bytes-consumed-rate The average number of bytes consumed
per second for a specific topic.
records-per-request-avg The average number of records in each
request for a specific topic.
records-consumed-rate The average number of records consumed
per second for a specific topic.
Producer Level Application Metrics (using JConsole)
Consumer Level Application Metrics (using JConsole)
Thank you.
Twitter: @digitalis_io Web: https://digitalis.io

More Related Content

What's hot (20)

PDF
Effective testing for spark programs Strata NY 2015
Holden Karau
 
PPTX
Apache Flink Training: DataStream API Part 1 Basic
Flink Forward
 
PPTX
Tale of Kafka Consumer for Spark Streaming
Sigmoid
 
PDF
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
NAVER D2
 
PPTX
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
PDF
PuppetDB: Sneaking Clojure into Operations
grim_radical
 
PDF
Scaling Django with gevent
Mahendra M
 
PDF
Apache Storm Tutorial
Farzad Nozarian
 
PDF
Real time and reliable processing with Apache Storm
Andrea Iacono
 
PDF
What's new with Apache Spark's Structured Streaming?
Miklos Christine
 
PDF
Developing Java Streaming Applications with Apache Storm
Lester Martin
 
PDF
Storm: The Real-Time Layer - GlueCon 2012
Dan Lynn
 
PDF
Troubleshooting Complex Oracle Performance Problems with Tanel Poder
Tanel Poder
 
PDF
Apache ZooKeeper
Scott Leberknight
 
PDF
Advanced Postgres Monitoring
Denish Patel
 
PDF
Scala+data
Samir Bessalah
 
PDF
Don't change the partition count for kafka topics!
Dainius Jocas
 
PDF
Adding replication protocol support for psycopg2
Alexander Shulgin
 
PDF
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
PDF
Oracle Latch and Mutex Contention Troubleshooting
Tanel Poder
 
Effective testing for spark programs Strata NY 2015
Holden Karau
 
Apache Flink Training: DataStream API Part 1 Basic
Flink Forward
 
Tale of Kafka Consumer for Spark Streaming
Sigmoid
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
NAVER D2
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
PuppetDB: Sneaking Clojure into Operations
grim_radical
 
Scaling Django with gevent
Mahendra M
 
Apache Storm Tutorial
Farzad Nozarian
 
Real time and reliable processing with Apache Storm
Andrea Iacono
 
What's new with Apache Spark's Structured Streaming?
Miklos Christine
 
Developing Java Streaming Applications with Apache Storm
Lester Martin
 
Storm: The Real-Time Layer - GlueCon 2012
Dan Lynn
 
Troubleshooting Complex Oracle Performance Problems with Tanel Poder
Tanel Poder
 
Apache ZooKeeper
Scott Leberknight
 
Advanced Postgres Monitoring
Denish Patel
 
Scala+data
Samir Bessalah
 
Don't change the partition count for kafka topics!
Dainius Jocas
 
Adding replication protocol support for psycopg2
Alexander Shulgin
 
Introduction to Apache ZooKeeper | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Oracle Latch and Mutex Contention Troubleshooting
Tanel Poder
 

Similar to Virtual Bash! A Lunchtime Introduction to Kafka (20)

PDF
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
PPTX
Apache Kafka 101 by Confluent Developer Friendly
itplanningandarchite
 
PPTX
apache-kafka-101 a simple presentation on how to use Kafka
TejaIlla
 
PDF
Introduction to apache kafka
Samuel Kerrien
 
PDF
Apache KAfka
Pedro Alcantara
 
PDF
Apache Kafka Scalable Message Processing and more!
Guido Schmutz
 
DOCX
KAFKA Quickstart
Vikram Singh Chandel
 
PDF
Apache Kafka Architecture & Fundamentals Explained
confluent
 
PDF
Kafka Deep Dive
Knoldus Inc.
 
PDF
Kafka syed academy_v1_introduction
Syed Hadoop
 
PDF
Kafka 101 and Developer Best Practices
confluent
 
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
PPTX
Kafka overview v0.1
Mahendran Ponnusamy
 
PPTX
kafka_session_updated.pptx
Koiuyt1
 
PPTX
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
PDF
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
PDF
Apache Kafka - From zero to hero
Apache Kafka TLV
 
PDF
Kafka zero to hero
Avi Levi
 
PDF
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
PPTX
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
somnathdeb0212
 
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
Apache Kafka 101 by Confluent Developer Friendly
itplanningandarchite
 
apache-kafka-101 a simple presentation on how to use Kafka
TejaIlla
 
Introduction to apache kafka
Samuel Kerrien
 
Apache KAfka
Pedro Alcantara
 
Apache Kafka Scalable Message Processing and more!
Guido Schmutz
 
KAFKA Quickstart
Vikram Singh Chandel
 
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Kafka Deep Dive
Knoldus Inc.
 
Kafka syed academy_v1_introduction
Syed Hadoop
 
Kafka 101 and Developer Best Practices
confluent
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Kafka overview v0.1
Mahendran Ponnusamy
 
kafka_session_updated.pptx
Koiuyt1
 
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
Apache Kafka - From zero to hero
Apache Kafka TLV
 
Kafka zero to hero
Avi Levi
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
somnathdeb0212
 
Ad

Recently uploaded (20)

PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PDF
SQL for Accountants and Finance Managers
ysmaelreyes
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PDF
UNISE-Operation-Procedure-InDHIS2trainng
ahmedabduselam23
 
PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PDF
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
PPTX
办理学历认证InformaticsLetter新加坡英华美学院毕业证书,Informatics成绩单
Taqyea
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PPTX
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PPTX
What Is Data Integration and Transformation?
subhashenia
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PPTX
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
SQL for Accountants and Finance Managers
ysmaelreyes
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
UNISE-Operation-Procedure-InDHIS2trainng
ahmedabduselam23
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
办理学历认证InformaticsLetter新加坡英华美学院毕业证书,Informatics成绩单
Taqyea
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
What Is Data Integration and Transformation?
subhashenia
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
big data eco system fundamentals of data science
arivukarasi
 
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
Ad

Virtual Bash! A Lunchtime Introduction to Kafka

  • 2. Wrote this…. Works with this amazing bunch.
  • 3. The Rules Haven’t Changed…… You may heckle via Twitter….. @jasonbelldata It’s Friday, you might as well. It’s in print, no one can argue…..
  • 7. Think of it like this…..
  • 10. IT
  • 11. IS
  • 12. NOT
  • 13. A
  • 15. Kafka is an event processing platform which is based on a distributed architecture.
  • 16. Kafka is an event processing platform which is based on a distributed architecture. To producers and consumer subscribed to the system it would appear as a standalone processing engine but production systems are built on many machines.
  • 17. Kafka is an event processing platform which is based on a distributed architecture. To producers and consumer subscribed to the system it would appear as a standalone processing engine but production systems are built on many machines. It can handle millions of messages throughput, dependent on physical disk and RAM on the machines, and is fault tolerant.
  • 19. Messages are sent to topics which are written sequentially in an immutable log. Kafka can support many topics and can be replicated and partitioned.
  • 20. Once the records are appended to the topic log they can’t be deleted or amended.
  • 21. (If the customer wants you to “replay from the start”, well they can’t)
  • 22. It’s a very simple data structure where each message is byte encoded.
  • 23. Each Message has: A key A value (payload) A timestamp A header (for meta data)
  • 24. Producers and consumers can serialise and deserialise data to various formats. (I’ll talk about that later)
  • 39. bin/kafka-topics --create -–zookeeper localhost:2181 --replication-factor 4 --partitions 10 --topic testtopic --config min.insync.replicas=2
  • 43. $ bin/kafka-topics --zookeeper localhost:2181 --delete --topic testtopic Topic testtopic is marked for deletion. Note: This will have no impact if delete.topic.enable is not set to true.
  • 46. log.retention.hours log.retention.minutes log.retention.ms The default is 168 hours (7 days) if not set. Note: The smallest value setting take priority, be careful!
  • 48. log.retention.bytes Defines volume of log bytes retained. Applied per partition, so if set to 1GB and you have 8 partitions, you are storing 8GB.
  • 50. C/C++ github.com/edenhill/librdkafka Go github.com/confluentinc/confluent-kafka-go Java Kafka Consumer and Kafka Producer JMS JMS Client .NET github.com/confluentinc/confluent-kafka-dotnet Python github.com/confluentinc/confluent-kafka-python Client Library Language Support
  • 51. Client Libraries: Java and Go. Java: Full support for all API features. Go: Exactly once semantics Kafka Streams Schema Registry Simplified Installation Are NOT supported.
  • 53. Producers (Sends messages to the Kafka Cluster)
  • 54. import java.util.Properties; import org.apache.kafka.clients.producer.Producer; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord; public class SimpleProducer { public static void main(String[] args) throws Exception{ String topicName = “testtopic”; Properties props = new Properties(); props.put("bootstrap.servers", “localhost:9092"); props.put("acks", “all"); props.put("retries", 0); props.put("batch.size", 10); props.put("linger.ms", 1); props.put("buffer.memory", 33554432); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); Producer<String, String> producer = new KafkaProducer <String, String>(props); for(int i = 0; i < 10; i++) producer.send(new ProducerRecord<String, String>(topicName, Integer.toString(i), “This is my message: “ + Integer.toString(i))); System.out.println(“Message sent successfully”); producer.close(); } }
  • 55. import java.util.Properties; import org.apache.kafka.clients.producer.Producer; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord; public class SimpleProducer { public static void main(String[] args) throws Exception{ String topicName = “testtopic”; Properties props = new Properties(); props.put("bootstrap.servers", “localhost:9092"); props.put("acks", “all"); props.put("retries", 0); props.put("batch.size", 10); props.put("linger.ms", 1); props.put("buffer.memory", 33554432); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); Producer<String, String> producer = new KafkaProducer <String, String>(props); for(int i = 0; i < 10; i++) producer.send(new ProducerRecord<String, String>(topicName, Integer.toString(i), “This is my message: “ + Integer.toString(i))); System.out.println(“Message sent successfully”); producer.close(); } }
  • 56. props.put("acks", “all"); Value Action -1/all Message has been received by the leader and followers. 0 Fire and Forget 1 Once the message is received by the leader.
  • 58. public class ProducerWithCallback implements ProducerInterceptor{ private int onSendCount; private int onAckCount; private final Logger logger = LoggerFactory.getLogger(ProducerWithCallback.class); @Override public ProducerRecord onSend(final ProducerRecord record) { onSendCount++; System.out.println(String.format("onSend topic=%s key=%s value=%s %d n", record.topic(), record.key(), record.value().toString(), record.partition())); return record; } @Override public void onAcknowledgement(final RecordMetadata metadata, final Exception exception) { onAckCount++; System.out.println(String.format("onAck topic=%s, part=%d, offset=%dn", metadata.topic(), metadata.partition(), metadata.offset())); } @Override public void close() { System.out.println("Total sent: " + onSendCount); System.out.println("Total acks: " + onAckCount); } @Override public void configure(Map<String,?> configs) { } }
  • 59. import java.util.Properties; import org.apache.kafka.clients.producer.Producer; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord; public class SimpleProducer { public static void main(String[] args) throws Exception{ String topicName = “testtopic”; Properties props = new Properties(); props.put("bootstrap.servers", “localhost:9092"); props.put("acks", “all"); props.put("retries", 0); props.put("batch.size", 10); props.put("linger.ms", 1); props.put("buffer.memory", 33554432); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); Producer<String, String> producer = new KafkaProducer <String, String>(props); for(int i = 0; i < 10; i++) producer.send(new ProducerRecord<String, String>(topicName, Integer.toString(i), “This is my message: “ + Integer.toString(i))); System.out.println(“Message sent successfully”); producer.close(); } }
  • 60. //Only one in-flight messages per Kafka broker connection // - max.in.flight.requests.per.connection (default 5) props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION,1); //Set the number of retries - retries props.put(ProducerConfig.RETRIES_CONFIG, 3); //Request timeout - request.timeout.ms props.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, 15000); //Only retry after one second. props.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, 1000);
  • 61. import java.util.Properties; import org.apache.kafka.clients.producer.Producer; import org.apache.kafka.clients.producer.KafkaProducer; import org.apache.kafka.clients.producer.ProducerRecord; public class SimpleProducer { public static void main(String[] args) throws Exception{ String topicName = “testtopic”; Properties props = new Properties(); props.put("bootstrap.servers", “localhost:9092"); props.put("acks", “all"); props.put("retries", 0); props.put("batch.size", 16384); props.put("linger.ms", 1); props.put("buffer.memory", 33554432); props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); Producer<String, String> producer = new KafkaProducer <String, String>(props); for(int i = 0; i < 10; i++) producer.send(new ProducerRecord<String, String>(topicName, Integer.toString(i), “This is my message: “ + Integer.toString(i))); System.out.println(“Message sent successfully”); producer.close(); } }
  • 62. byte[] Serdes.ByteArray(), Serdes.Bytes() ByteBuffer Serdes.ByteBuffer() Double Serdes.Double() Integer Serdes.Integer() Long Serdes.Long() String Serdes.String() JSON JsonPOJOSerializer()/Deserializer() Serialize/Deserialize Types
  • 64. public class SimpleConsumer { public static void main(String[] args) throws Exception { if(args.length == 0){ System.out.println("Enter topic name"); return; } //Kafka consumer configuration settings String topicName = args[0].toString(); Properties props = new Properties(); props.put("bootstrap.servers", "localhost:9092"); props.put("group.id", "test"); props.put("enable.auto.commit", "true"); props.put("auto.commit.interval.ms", "1000"); props.put("session.timeout.ms", "30000"); props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer"); KafkaConsumer<String, String> consumer = new KafkaConsumer <String, String>(props); //Kafka Consumer subscribes list of topics here. consumer.subscribe(Arrays.asList(topicName)) //print the topic name System.out.println("Subscribed to topic " + topicName); int i = 0; while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) // print the offset,key and value for the consumer records. System.out.printf("offset = %d, key = %s, value = %sn", record.offset(), record.key(), record.value()); } } }
  • 68. The offset is a simple integer number that is used by Kafka to maintain the current position of a consumer.
  • 69. kafka-consumer-groups --bootstrap-server localhost:9092 --describe --group mygroup GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER mygroup t1 0 1 3 2 test-consumer
  • 70. GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG OWNER mygroup t1 0 1 12 11 test-consumer
  • 72. public class WordCountLambdaExample { static final String inputTopic = "streams-plaintext-input"; static final String outputTopic = "streams-wordcount-output"; /** * The Streams application as a whole can be launched like any normal Java application that has a `main()` method. */ public static void main(final String[] args) { final String bootstrapServers = args.length > 0 ? args[0] : "localhost:9092"; // Configure the Streams application. final Properties streamsConfiguration = getStreamsConfiguration(bootstrapServers); // Define the processing topology of the Streams application. final StreamsBuilder builder = new StreamsBuilder(); createWordCountStream(builder); final KafkaStreams streams = new KafkaStreams(builder.build(), streamsConfiguration); streams.cleanUp(); streams.start(); Runtime.getRuntime().addShutdownHook(new Thread(streams::close)); } static Properties getStreamsConfiguration(final String bootstrapServers) { final Properties streamsConfiguration = new Properties(); streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, "wordcount-lambda-example"); streamsConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, "wordcount-lambda-example-client"); streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers); streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName()); streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName()); streamsConfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10 * 1000); streamsConfiguration.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 0); streamsConfiguration.put(StreamsConfig.STATE_DIR_CONFIG, TestUtils.tempDirectory().getAbsolutePath()); return streamsConfiguration; } static void createWordCountStream(final StreamsBuilder builder) { final KStream<String, String> textLines = builder.stream(inputTopic); final Pattern pattern = Pattern.compile("W+", Pattern.UNICODE_CHARACTER_CLASS); final KTable<String, Long> wordCounts = textLines .flatMapValues(value -> Arrays.asList(pattern.split(value.toLowerCase()))) .groupBy((keyIgnored, word) -> word) .count(); wordCounts.toStream().to(outputTopic, Produced.with(Serdes.String(), Serdes.Long())); } }
  • 74. Supports exactly once transactions! Yay!
  • 75. I know what you’re thinking…...
  • 78. What is Kafka Connect?
  • 79. Provides a mechanism for data sources and data sinks.
  • 81. Data Source Data from a thing (Database, Twitter stream, Logstash etc) going to Kafka.
  • 82. Data Sink Data going from Kafka to a thing (Database, file, ElasticSearch)
  • 83. There are many community connector plugins that will fit most needs.
  • 85. { "name": "twitter_source_json_01", "config": { "connector.class": "com.github.jcustenborder.kafka.connect.twitter.TwitterSourceConnector", "twitter.oauth.accessToken": "xxxxxxxxxxxxx", "twitter.oauth.consumerSecret": "xxxxxxxxxxxx", "twitter.oauth.consumerKey": "xxxxxxxxxxxx", "twitter.oauth.accessTokenSecret": "xxxxxxxxxxx", "kafka.delete.topic": "twitter_deletes_json_01", "value.converter": "org.apache.kafka.connect.json.JsonConverter", "key.converter": "org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": false, "key.converter.schemas.enable": false, "kafka.status.topic": "twitter_json_01", "process.deletes": true, "filter.keywords": "#instil, #virtualdevbash" } }
  • 86. curl -XPOST -H ‘Content-type: application/json’ -d @myconnector.json http://localhost:8083/connectors
  • 88. { "name": "string-jdbc-sink", "config": { "connector.class": “com.domain.test.StringJDBCSink”, "connector.type": "sink", "tasks.max": "1", "topics": “topic_to_read”, "topic.type": "avro", "tablename":"testtopic", "connection.user": "dbuser", "connection.password": “dbpassword_encoded”, "connection.url": "jdbc:mysql://localhost:3307/connecttest", "query.string": "INSERT INTO testtopic (payload) VALUES(?);", "db.driver": "com.mysql.jdbc.Driver", "key.converter":"org.apache.kafka.connect.storage.StringConverter", "value.converter":"io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url": "http://localhost:8081" } }
  • 94. At present the main broker Kafka metrics (via JMX) kafka.server:* kafka.controller:* kafka.coordinator.group:* kafka.coordinator.transaction:* kafka.log:* kafka.network:* kafka.server:* kafka.utils:*
  • 95. Topic Level Fetch Metrics MBean: kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+),topic=([-.w]+) Key Name Description fetch-size-avg The average number of bytes fetched per request for a specific topic. fetch-size-max The maximum number of bytes fetched per request for a specific topic. bytes-consumed-rate The average number of bytes consumed per second for a specific topic. records-per-request-avg The average number of records in each request for a specific topic. records-consumed-rate The average number of records consumed per second for a specific topic.
  • 96. Producer Level Application Metrics (using JConsole)
  • 97. Consumer Level Application Metrics (using JConsole)
  • 98. Thank you. Twitter: @digitalis_io Web: https://digitalis.io