SlideShare a Scribd company logo
Kafka Summit 2020
Highlights
Sep 2020
2
Kafka Summit 2020 —
By the numbers
https://kafkasummit.io/
• 10 Keynote Speakers
• 87 Sessions
• 46 Birds of a Feather & Ask the
Experts
• 102 Session Speakers
Attendees from 143 Countries
Sessions
3
• Opening Keynote — Gwen Shapira
• Feed your SIEM Smart with Kafka Connect
• Building a Modern, Scalable Cyber Intelligence Platform with Confluent Kafka
• Learnings From the Field. Lessons From Working with Dozens of Small & Large
Deployments
• Maximize the Business Value of Machine Learning and Data Science with
Kafka
• Measuring your Digital Transformation: Why Real Time Analytics are the
Critical Next Step
• MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa
• The Flux Capacitor of Kafka Streams and ksqlDB
• Keynote: Kafka ♥ Cloud — Jay Kreps
Gwen Shapira
Opening Keynote
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Feed your SIEM Smart with
Kafka Connect
Vitalii Rudenskyi
Information Security Architect
McKesson Corporation
Background and Motivation
• How to use Kafka Connect to ingest, consume and deliver data to SIEM
• Migration from old generation SIEM to new SIEM solution
• Not easy, not fun at all!
• Not to make the mistake again
• Decided to have own data ingestion and consumption,
• Not to be dependent on any kind of Vendors.
Challenges
Architecture
The 3 Keys
Push - NettySource Connector
Pull - PollableAPIClient Connector
Pull - PollableAPIClient Connector
API Client Implementations
Cutomized Transformation
Data Archiving
Lessons Learned
Key Metrics
Sharing
• NettySource Connector
• https://github.com/vrudenskyi/kafka-connect-netty-source
• PollableAPIClient Connector
• https://github.com/vrudenskyi/kafka-connect-pollable-source
• Transformations Library
• https://github.com/vrudenskyi/kafka-connect-transform
Key Takeaways
• Kafka has become an integral part in enterprise SIEM Modernization
• With Kafka and Connect, customer can take a “vendor agnostic” approach at their SIEM
strategy
• Kafka Connect is flexible solution to deal with various data sources/sinks and different
data formats. In this particular case, 530+ Connectors have been deployed
• Kafka Connect is extremely extensible and customers could customize or develop their
own Connector based on requirements.
• Speaker has developed customized Transformation Library to transform different part of
the messages. Looking to implement Stream Processing as Next Step.
• Speaker has shared smart tips on making use of Headers and has created a solution for
connector High Availability.
Jack Noel - Security Solutions Architect - Intel
Building a Modern, Scalable Cyber
Intelligence Platform with Confluent Kafka
Intel Information Security’s Mission
To Keep Intel’s Intel legal and secure!
The mission is never done. e.g. find the balance between infosec requirements while being cost effective
and agile.
APAC Kafka Summit - Best Of
● Data Filtering
● Data comes
from Partners
● IT non-security
data
● Prioritise high-
value data
● Enrich Data
● Asset
information and
locations
● IP CIDR
locations
● Process with
KSTREAMS into
clean topics
● Acquire data
once, consume
many times
● Using data is
expensive
● Filtering, joining
and enrich data
instream to
provide rich
data upstream
● ML Instream -
Advanced
● Become
Predictive!
● Reduce
technical debt,
e.g. point to
point
integrations
● Always on
● Thriving
community
● Slide speaks for
itself!
● Jack Noel
authored some
of these
Key Takeaways
• There are a lot of security vendors, each with their own way of producing and parsing
Data. Kafka Makes that more Seamless
• No Vendor Lock ins, e.g. Use OS Native as much as possible.
• Share data with other teams
• Collect data from other teams. e.g. vendors or IT
• Operational Maturity is important to ensure success. e.g. people, process and tools.
Mitch Henderson - Customer Success Technical Architect
Learnings From the Field. Lessons From
Working with Dozens of Small & Large
Deployments
Key do's and don'ts for managing Kafka installations
• Upgrades - do them well and often, don't fall into trap of
sticking with old version
• How to execute upgrades well
• Monitoring: JMX
• Configuration - varying from defaults, recommended
tunables
• Logging
• Quotas
• Clusters - single or multiple
Key takeaways
• If you don't have the option of running fully-managed
Apache Kafka as Confluent Cloud - Kafka is a distributed
system and takes careful and deliberate management.
• There are many recommended settings changes from
default for certain types of production-operations -
OOTB settings are really development settings.
• Make upgrades part of Kafka muscle-memory.
• Don't wait until you have problems to set recommended
settings and guardrails.
• If in-doubt, hire a professional - Confluent PS.
Tom Szumowski - Senior Data Scientist, Nuuly
Chirag Dadia - Directory of Engineering, Nuuly
Maximize the Business Value of Machine
Learning and Data Science with Kafka
Background and Motivation
Nuuly is a clothing rental subscription service driven by a Kafka-based architecture.
As an online platform they are continually looking to improve their service to better meet
the needs of their customers, and drive revenue.
Data analytics and machine learning form a large part of their optimisation strategy, but
how to implement this mostly offline batch style processing with a real time platform?
Challenge - typical warehouses track SKU and stock level - Nuuly track individual items. Real
time inventory becomes critical - a rented item should not be rented twice at the same time.
Background and Motivation
Background and Motivation
Data Science
Everything is asynchronous, ETL pipelines transform, sent to warehouse
ML that integrates with user interactions
Stream processors used to materialise state. Kafka is our data store.
Adding new steps is as easy as building a new microservice.
Data Science
Data Science
Data Science
Data Science
Data Science
Data Science
Data Science
Data Science
Measuring your Digital Transformation: Why
Real Time Analytics are the Critical Next Step
Rachel Padreschi, Vice President of Community
Imply
Motivations
Motivations (2)
Common Current State
Characteristics of “Real-Time”
Real-Time Analytics
The Data River
Digital Native Expectations
Velocity of Data vs Velocity of Understanding
Key Takeaways
• Shared vision across the organization - common view of the world (data)
• Event driven adds value to customer interactions
• “Gen Z” expectations - contextual, personalized, real-time
• Real-time is freshness of data + fast analytics
• Velocity of data and velocity of understanding are distinct measures
Fadhili Juma
Remitly Inc
MQTT and Apache Kafka: The Solution
to Poor Internet Connectivity in Africa
Background and Motivation
• Internet connectivity is a problem in the remote villages of Africa
• There were challenges implementing agency banking in villages in Tanzania
• How MQTT and Apache Kafka was used to overcome these problems
*** Agency banking model[1] is a function of certain Commercial banks in kenya and as regulated by Central Bank of Kenya legislation that
allows them to contract third party retail networks as Banking agent. Upon successful application, vetting and approval,[2] these Agents are
authorized to offer selected products and services on behalf of the Bank. This relationship creates an Agency Banking business model.
Challenges
Phase 1
• Phase 1 tried with ReST APIs
• Lots of resources consumed
• Problems with connections
• Lost transactions
Phase 2 - MQTT Evaluation
MQTT Cons
Phase 2.1
Architecture
Components
Key Takeaways
• How MQTT and Apache Kafka has been leveraged to provide digital banking to a region
which has poor internet connectivity
• MQTT maintains the session from hand held devices but MQTT does not provide for long
term message storage. Also, MQTT connectivity with downstream enterprise solutions is
also not great.
• Apache Kafka is used to store data for a longer period of time and with Connectors, the
data can be pushed to all the downstream systems where further analytics could be done
• MQTT connector is used as the bridge between Kafka and MQTT
• Since, management of Kafka is a specialized job which requires a lot of effort and $, they
are moving to Confluent Cloud gradually
Matthias J. Sax | Software Engineer, Confluent
The Flux Capacitor of
Kafka Streams and ksqlDB
Stream Processing is our Density.
Recap: Time 101
77@MatthiasJSax
Event Time
• When an event happened (embedded in the message/record)
• Ensures deterministic processing
• Used to express processing semantics, i.e., impacts the result
Processing Time (aka Wall-clock Time)
• When an event/message/record is processed
• Used for non-functional properties
• Timeouts
• Data rate control
• Periodic actions
• Should not impact the result: otherwise, non-deterministic
Yeah, well, history is gonna change
Input records with descending event timestamp are considered out-of-order
• Out-of-order if event-time < stream-time
78@MatthiasJSax
14:01… 14:03… 14:08…14:01… 14:02… 14:11…
stream-time
14:03 14:1114:0814:01
advances
out-of-order out-of-order
14:03 14:08
You are not thinking fourth-dimensionally
79@MatthiasJSax
14:11…Topic-A, Partition 0
Topic-B, Partition 0 empty
Pause processing and poll() for new data.
Unblock when timeout max.task.idle.ms hits.
… 14:01
14:02… 14:04… 14:03…
14:05…
14:08…
When the hell are they?
Tumbling Windows
• fixed size / non-overlapping / grouped (i.e, GROUP BY)
Time Windows
81@MatthiasJSax
14:00 14:05 14:1514:10
No variable size window support yet:
• Weeks, Month, Years
• No out-of-the-box time zone support
• https://github.com/confluentinc/kafka-streams-examples/blob/5.5.0-post/src/test/java/io/confluent/examples/streams/window/DailyTimeWindows.java
Time Windows
82@MatthiasJSax
Hopping Windows
• fixed size / overlapping / grouped (i.e., GROUP BY)
• Different to a sliding window!
14:00 14:05 14:1514:10
14:01 14:06 14:1614:11
14:02 14:07 14:1714:12
14:03 14:08 14:1814:13
14:04 14:09 14:1914:14
Different use-case: aggregate the data of the last (e.g.) 10 minutes
• Window boundaries are data dependent and unknown upfront (cf. KIP-450)
Sliding Windows
83@MatthiasJSax
14:03… 14:07… 14:12… 14:19… 14:26…
13:53 | 14:03
13:57
14:07
14:02 14:12
14:04
14:1414:08 14:18
14:09 14:19
14:13 14:23
14:16 14:26
14:20 14:30
When we are processing, we don’t need watermarks
Grace period: defines a cut-off for out-of-order records that are (too) late
• Grace period is defined per operator
• Late if stream-time - event-time > grace period
• Late data is ignored and not processed by the operator
84@MatthiasJSax
14:01… 14:03… 14:08…14:01… 14:02… 14:11…
stream-time
14:03 14:1114:0814:01
advances
grace := 5min

-> late (delay: 6min)
14:03 14:08
Retention Time
How long to store data in a (windowed) table.

TimeWindows.of(Duration.ofMinutes(5L)).grace(Duration.ofMinutes(1L))
Materialized.as(…).withRetention(Duration.ofHours(1L))
WINDOW TUMBLING(SIZE 5 MINUTES, GRACE PERIOD 1 MINUTE, RETENTION TIME 1 HOUR)
85@MatthiasJSax
stream-time
SIZE

5 MINUTES
GRACE PERIOD

1 MINUTE
windowStart
@14:00
windowEnd
@14:05
window close
@14:06
14:05 15:05
retention
(1 hour)
Stream-Stream Join
86@MatthiasJSax
Streams are conceptually unbounded
• Limited join scope via a sliding time window
leftStream.join(rightStream, JoinWindows.of(Duration.ofMinutes(5L)));
SELECT * FROM leftStream AS l JOIN rightStream AS r WITHIN 5 MINUTES ON l.id = r.id;
14:041 14:162 14:083
14:01A 14:11B 14:23C
14:041⨝A 14:162⨝B 14:113⨝B
max(l.ts; r.ts)
Stream-Table Join
87@MatthiasJSax
Stream-Table join is a temporal join
14:01a 14:03b 14:05c 14:08b 14:11a
14:02… 14:04… 14:07…14:06… 14:10…
14:01a
14:03b
14:05c
14:05
14:01a
14:08b
14:05c
14:08
14:11a
14:08b
14:05c
14:11
14:01a
14:03b
14:03
14:01a
14:01
14:06 14:07 14:1014:0414:02
You Need to Know your History
88@MatthiasJSax
Table Changelog
Stream
truncation
retention time
Lost History
fully compacted append
new data

(tail)
Jay Kreps
Keynote: Jay Kreps, Confluent |
Kafka ♥ Cloud
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Challenges
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
APAC Kafka Summit - Best Of
Phase 2 - MQTT Evaluation
Phase 2 - MQTT Evaluation
Learn Kafka.
Start building with
Apache Kafka at
Confluent Developer.
developer.confluent.io
APAC Kafka Summit - Best Of

More Related Content

PDF
Building Event-Driven Services with Apache Kafka
confluent
 
PDF
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
HostedbyConfluent
 
PDF
Evolving from Messaging to Event Streaming
confluent
 
PDF
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
confluent
 
PDF
Death of the dumb pipes: Using Apache Kafka® for Integration projects
HostedbyConfluent
 
PDF
What is Apache Kafka and What is an Event Streaming Platform?
confluent
 
PDF
Can Apache Kafka Replace a Database?
Kai Wähner
 
PDF
Battle-tested event-driven patterns for your microservices architecture - Sca...
Natan Silnitsky
 
Building Event-Driven Services with Apache Kafka
confluent
 
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
HostedbyConfluent
 
Evolving from Messaging to Event Streaming
confluent
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
confluent
 
Death of the dumb pipes: Using Apache Kafka® for Integration projects
HostedbyConfluent
 
What is Apache Kafka and What is an Event Streaming Platform?
confluent
 
Can Apache Kafka Replace a Database?
Kai Wähner
 
Battle-tested event-driven patterns for your microservices architecture - Sca...
Natan Silnitsky
 

What's hot (20)

PPTX
Data Streaming with Apache Kafka & MongoDB
confluent
 
PDF
Elastically Scaling Kafka Using Confluent
confluent
 
PDF
War Stories: DIY Kafka
confluent
 
PDF
Building a Web Application with Kafka as your Database
confluent
 
PDF
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
HostedbyConfluent
 
PDF
Application modernization patterns with apache kafka, debezium, and kubernete...
Bilgin Ibryam
 
PDF
Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...
HostedbyConfluent
 
PDF
The Event Mesh: real-time, event-driven, responsive APIs and beyond
Solace
 
PDF
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
confluent
 
PDF
MongoDB .local London 2019: Streaming Data on the Shoulders of Giants
Lisa Roth, PMP
 
PDF
Real time data processing and model inferncing platform with Kafka streams (N...
KafkaZone
 
PDF
Bank of China Tech Talk 2: Introduction to Streaming Data and Stream Processi...
confluent
 
PPTX
IoT and Event Streaming at Scale with Apache Kafka
confluent
 
PDF
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
confluent
 
PDF
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
HostedbyConfluent
 
PPTX
Stream Processing Live Traffic Data with Kafka Streams
Tom Van den Bulck
 
PPTX
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
Kai Wähner
 
PDF
Introduction to Apache Kafka and Confluent... and why they matter!
Paolo Castagna
 
PDF
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Timo Walther
 
PDF
Introducing Change Data Capture with Debezium
ChengKuan Gan
 
Data Streaming with Apache Kafka & MongoDB
confluent
 
Elastically Scaling Kafka Using Confluent
confluent
 
War Stories: DIY Kafka
confluent
 
Building a Web Application with Kafka as your Database
confluent
 
Can Apache Kafka Replace a Database? – The 2021 Update | Kai Waehner, Confluent
HostedbyConfluent
 
Application modernization patterns with apache kafka, debezium, and kubernete...
Bilgin Ibryam
 
Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...
HostedbyConfluent
 
The Event Mesh: real-time, event-driven, responsive APIs and beyond
Solace
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
confluent
 
MongoDB .local London 2019: Streaming Data on the Shoulders of Giants
Lisa Roth, PMP
 
Real time data processing and model inferncing platform with Kafka streams (N...
KafkaZone
 
Bank of China Tech Talk 2: Introduction to Streaming Data and Stream Processi...
confluent
 
IoT and Event Streaming at Scale with Apache Kafka
confluent
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with K...
confluent
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
HostedbyConfluent
 
Stream Processing Live Traffic Data with Kafka Streams
Tom Van den Bulck
 
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
Kai Wähner
 
Introduction to Apache Kafka and Confluent... and why they matter!
Paolo Castagna
 
Introduction to Stream Processing with Apache Flink (2019-11-02 Bengaluru Mee...
Timo Walther
 
Introducing Change Data Capture with Debezium
ChengKuan Gan
 
Ad

Similar to APAC Kafka Summit - Best Of (20)

PPTX
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
confluent
 
PPTX
Streaming Data Ingest and Processing with Apache Kafka
Attunity
 
PDF
Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way
ScyllaDB
 
PDF
Confluent & GSI Webinars series - Session 3
confluent
 
PDF
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
Monal Daxini
 
PDF
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
HostedbyConfluent
 
PDF
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
ScyllaDB
 
PDF
Confluent Partner Tech Talk with Synthesis
confluent
 
PDF
Webinar: SQL for Machine Data?
Crate.io
 
PDF
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
PPTX
Modernizing your Application Architecture with Microservices
confluent
 
PDF
Presentation cisco intelligent automation for cloud
xKinAnx
 
PDF
Capital One Delivers Risk Insights in Real Time with Stream Processing
confluent
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PDF
Unit 1.2 move to cloud computing
eShikshak
 
PDF
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
PDF
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
confluent
 
PDF
Unleashing Apache Kafka and TensorFlow in the Cloud

Kai Wähner
 
PDF
Building Event Streaming Architectures on Scylla and Kafka
ScyllaDB
 
PDF
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
confluent
 
Streaming Data Ingest and Processing with Apache Kafka
Attunity
 
Strategies For Migrating From SQL to NoSQL — The Apache Kafka Way
ScyllaDB
 
Confluent & GSI Webinars series - Session 3
confluent
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
Monal Daxini
 
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
HostedbyConfluent
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
ScyllaDB
 
Confluent Partner Tech Talk with Synthesis
confluent
 
Webinar: SQL for Machine Data?
Crate.io
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
Modernizing your Application Architecture with Microservices
confluent
 
Presentation cisco intelligent automation for cloud
xKinAnx
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
confluent
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Unit 1.2 move to cloud computing
eShikshak
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
confluent
 
Unleashing Apache Kafka and TensorFlow in the Cloud

Kai Wähner
 
Building Event Streaming Architectures on Scylla and Kafka
ScyllaDB
 
SpringPeople - Introduction to Cloud Computing
SpringPeople
 
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
PDF
Migration, backup and restore made easy using Kannika
confluent
 
PDF
Five Things You Need to Know About Data Streaming in 2025
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
PDF
Unlocking value with event-driven architecture by Confluent
confluent
 
PDF
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
PDF
Building API data products on top of your real-time data infrastructure
confluent
 
PDF
Speed Wins: From Kafka to APIs in Minutes
confluent
 
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 

Recently uploaded (20)

PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 

APAC Kafka Summit - Best Of

  • 2. 2 Kafka Summit 2020 — By the numbers https://kafkasummit.io/ • 10 Keynote Speakers • 87 Sessions • 46 Birds of a Feather & Ask the Experts • 102 Session Speakers Attendees from 143 Countries
  • 3. Sessions 3 • Opening Keynote — Gwen Shapira • Feed your SIEM Smart with Kafka Connect • Building a Modern, Scalable Cyber Intelligence Platform with Confluent Kafka • Learnings From the Field. Lessons From Working with Dozens of Small & Large Deployments • Maximize the Business Value of Machine Learning and Data Science with Kafka • Measuring your Digital Transformation: Why Real Time Analytics are the Critical Next Step • MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa • The Flux Capacitor of Kafka Streams and ksqlDB • Keynote: Kafka ♥ Cloud — Jay Kreps
  • 5. Phase 2 - MQTT Evaluation
  • 6. Phase 2 - MQTT Evaluation
  • 7. Phase 2 - MQTT Evaluation
  • 8. Phase 2 - MQTT Evaluation
  • 9. Phase 2 - MQTT Evaluation
  • 10. Phase 2 - MQTT Evaluation
  • 11. Phase 2 - MQTT Evaluation
  • 12. Phase 2 - MQTT Evaluation
  • 13. Phase 2 - MQTT Evaluation
  • 14. Phase 2 - MQTT Evaluation
  • 15. Feed your SIEM Smart with Kafka Connect Vitalii Rudenskyi Information Security Architect McKesson Corporation
  • 16. Background and Motivation • How to use Kafka Connect to ingest, consume and deliver data to SIEM • Migration from old generation SIEM to new SIEM solution • Not easy, not fun at all! • Not to make the mistake again • Decided to have own data ingestion and consumption, • Not to be dependent on any kind of Vendors.
  • 20. Push - NettySource Connector
  • 28. Sharing • NettySource Connector • https://github.com/vrudenskyi/kafka-connect-netty-source • PollableAPIClient Connector • https://github.com/vrudenskyi/kafka-connect-pollable-source • Transformations Library • https://github.com/vrudenskyi/kafka-connect-transform
  • 29. Key Takeaways • Kafka has become an integral part in enterprise SIEM Modernization • With Kafka and Connect, customer can take a “vendor agnostic” approach at their SIEM strategy • Kafka Connect is flexible solution to deal with various data sources/sinks and different data formats. In this particular case, 530+ Connectors have been deployed • Kafka Connect is extremely extensible and customers could customize or develop their own Connector based on requirements. • Speaker has developed customized Transformation Library to transform different part of the messages. Looking to implement Stream Processing as Next Step. • Speaker has shared smart tips on making use of Headers and has created a solution for connector High Availability.
  • 30. Jack Noel - Security Solutions Architect - Intel Building a Modern, Scalable Cyber Intelligence Platform with Confluent Kafka
  • 31. Intel Information Security’s Mission To Keep Intel’s Intel legal and secure! The mission is never done. e.g. find the balance between infosec requirements while being cost effective and agile.
  • 33. ● Data Filtering ● Data comes from Partners ● IT non-security data ● Prioritise high- value data ● Enrich Data
  • 34. ● Asset information and locations ● IP CIDR locations ● Process with KSTREAMS into clean topics
  • 35. ● Acquire data once, consume many times ● Using data is expensive ● Filtering, joining and enrich data instream to provide rich data upstream ● ML Instream - Advanced ● Become Predictive!
  • 36. ● Reduce technical debt, e.g. point to point integrations ● Always on ● Thriving community ● Slide speaks for itself!
  • 37. ● Jack Noel authored some of these
  • 38. Key Takeaways • There are a lot of security vendors, each with their own way of producing and parsing Data. Kafka Makes that more Seamless • No Vendor Lock ins, e.g. Use OS Native as much as possible. • Share data with other teams • Collect data from other teams. e.g. vendors or IT • Operational Maturity is important to ensure success. e.g. people, process and tools.
  • 39. Mitch Henderson - Customer Success Technical Architect Learnings From the Field. Lessons From Working with Dozens of Small & Large Deployments
  • 40. Key do's and don'ts for managing Kafka installations • Upgrades - do them well and often, don't fall into trap of sticking with old version • How to execute upgrades well • Monitoring: JMX • Configuration - varying from defaults, recommended tunables • Logging • Quotas • Clusters - single or multiple
  • 41. Key takeaways • If you don't have the option of running fully-managed Apache Kafka as Confluent Cloud - Kafka is a distributed system and takes careful and deliberate management. • There are many recommended settings changes from default for certain types of production-operations - OOTB settings are really development settings. • Make upgrades part of Kafka muscle-memory. • Don't wait until you have problems to set recommended settings and guardrails. • If in-doubt, hire a professional - Confluent PS.
  • 42. Tom Szumowski - Senior Data Scientist, Nuuly Chirag Dadia - Directory of Engineering, Nuuly Maximize the Business Value of Machine Learning and Data Science with Kafka
  • 43. Background and Motivation Nuuly is a clothing rental subscription service driven by a Kafka-based architecture. As an online platform they are continually looking to improve their service to better meet the needs of their customers, and drive revenue. Data analytics and machine learning form a large part of their optimisation strategy, but how to implement this mostly offline batch style processing with a real time platform? Challenge - typical warehouses track SKU and stock level - Nuuly track individual items. Real time inventory becomes critical - a rented item should not be rented twice at the same time.
  • 46. Data Science Everything is asynchronous, ETL pipelines transform, sent to warehouse ML that integrates with user interactions Stream processors used to materialise state. Kafka is our data store. Adding new steps is as easy as building a new microservice.
  • 55. Measuring your Digital Transformation: Why Real Time Analytics are the Critical Next Step Rachel Padreschi, Vice President of Community Imply
  • 63. Velocity of Data vs Velocity of Understanding
  • 64. Key Takeaways • Shared vision across the organization - common view of the world (data) • Event driven adds value to customer interactions • “Gen Z” expectations - contextual, personalized, real-time • Real-time is freshness of data + fast analytics • Velocity of data and velocity of understanding are distinct measures
  • 65. Fadhili Juma Remitly Inc MQTT and Apache Kafka: The Solution to Poor Internet Connectivity in Africa
  • 66. Background and Motivation • Internet connectivity is a problem in the remote villages of Africa • There were challenges implementing agency banking in villages in Tanzania • How MQTT and Apache Kafka was used to overcome these problems *** Agency banking model[1] is a function of certain Commercial banks in kenya and as regulated by Central Bank of Kenya legislation that allows them to contract third party retail networks as Banking agent. Upon successful application, vetting and approval,[2] these Agents are authorized to offer selected products and services on behalf of the Bank. This relationship creates an Agency Banking business model.
  • 68. Phase 1 • Phase 1 tried with ReST APIs • Lots of resources consumed • Problems with connections • Lost transactions
  • 69. Phase 2 - MQTT Evaluation
  • 74. Key Takeaways • How MQTT and Apache Kafka has been leveraged to provide digital banking to a region which has poor internet connectivity • MQTT maintains the session from hand held devices but MQTT does not provide for long term message storage. Also, MQTT connectivity with downstream enterprise solutions is also not great. • Apache Kafka is used to store data for a longer period of time and with Connectors, the data can be pushed to all the downstream systems where further analytics could be done • MQTT connector is used as the bridge between Kafka and MQTT • Since, management of Kafka is a specialized job which requires a lot of effort and $, they are moving to Confluent Cloud gradually
  • 75. Matthias J. Sax | Software Engineer, Confluent The Flux Capacitor of Kafka Streams and ksqlDB
  • 76. Stream Processing is our Density.
  • 77. Recap: Time 101 77@MatthiasJSax Event Time • When an event happened (embedded in the message/record) • Ensures deterministic processing • Used to express processing semantics, i.e., impacts the result Processing Time (aka Wall-clock Time) • When an event/message/record is processed • Used for non-functional properties • Timeouts • Data rate control • Periodic actions • Should not impact the result: otherwise, non-deterministic
  • 78. Yeah, well, history is gonna change Input records with descending event timestamp are considered out-of-order • Out-of-order if event-time < stream-time 78@MatthiasJSax 14:01… 14:03… 14:08…14:01… 14:02… 14:11… stream-time 14:03 14:1114:0814:01 advances out-of-order out-of-order 14:03 14:08
  • 79. You are not thinking fourth-dimensionally 79@MatthiasJSax 14:11…Topic-A, Partition 0 Topic-B, Partition 0 empty Pause processing and poll() for new data. Unblock when timeout max.task.idle.ms hits. … 14:01 14:02… 14:04… 14:03… 14:05… 14:08…
  • 80. When the hell are they?
  • 81. Tumbling Windows • fixed size / non-overlapping / grouped (i.e, GROUP BY) Time Windows 81@MatthiasJSax 14:00 14:05 14:1514:10 No variable size window support yet: • Weeks, Month, Years • No out-of-the-box time zone support • https://github.com/confluentinc/kafka-streams-examples/blob/5.5.0-post/src/test/java/io/confluent/examples/streams/window/DailyTimeWindows.java
  • 82. Time Windows 82@MatthiasJSax Hopping Windows • fixed size / overlapping / grouped (i.e., GROUP BY) • Different to a sliding window! 14:00 14:05 14:1514:10 14:01 14:06 14:1614:11 14:02 14:07 14:1714:12 14:03 14:08 14:1814:13 14:04 14:09 14:1914:14
  • 83. Different use-case: aggregate the data of the last (e.g.) 10 minutes • Window boundaries are data dependent and unknown upfront (cf. KIP-450) Sliding Windows 83@MatthiasJSax 14:03… 14:07… 14:12… 14:19… 14:26… 13:53 | 14:03 13:57 14:07 14:02 14:12 14:04 14:1414:08 14:18 14:09 14:19 14:13 14:23 14:16 14:26 14:20 14:30
  • 84. When we are processing, we don’t need watermarks Grace period: defines a cut-off for out-of-order records that are (too) late • Grace period is defined per operator • Late if stream-time - event-time > grace period • Late data is ignored and not processed by the operator 84@MatthiasJSax 14:01… 14:03… 14:08…14:01… 14:02… 14:11… stream-time 14:03 14:1114:0814:01 advances grace := 5min
 -> late (delay: 6min) 14:03 14:08
  • 85. Retention Time How long to store data in a (windowed) table.
 TimeWindows.of(Duration.ofMinutes(5L)).grace(Duration.ofMinutes(1L)) Materialized.as(…).withRetention(Duration.ofHours(1L)) WINDOW TUMBLING(SIZE 5 MINUTES, GRACE PERIOD 1 MINUTE, RETENTION TIME 1 HOUR) 85@MatthiasJSax stream-time SIZE
 5 MINUTES GRACE PERIOD
 1 MINUTE windowStart @14:00 windowEnd @14:05 window close @14:06 14:05 15:05 retention (1 hour)
  • 86. Stream-Stream Join 86@MatthiasJSax Streams are conceptually unbounded • Limited join scope via a sliding time window leftStream.join(rightStream, JoinWindows.of(Duration.ofMinutes(5L))); SELECT * FROM leftStream AS l JOIN rightStream AS r WITHIN 5 MINUTES ON l.id = r.id; 14:041 14:162 14:083 14:01A 14:11B 14:23C 14:041⨝A 14:162⨝B 14:113⨝B max(l.ts; r.ts)
  • 87. Stream-Table Join 87@MatthiasJSax Stream-Table join is a temporal join 14:01a 14:03b 14:05c 14:08b 14:11a 14:02… 14:04… 14:07…14:06… 14:10… 14:01a 14:03b 14:05c 14:05 14:01a 14:08b 14:05c 14:08 14:11a 14:08b 14:05c 14:11 14:01a 14:03b 14:03 14:01a 14:01 14:06 14:07 14:1014:0414:02
  • 88. You Need to Know your History 88@MatthiasJSax Table Changelog Stream truncation retention time Lost History fully compacted append new data
 (tail)
  • 89. Jay Kreps Keynote: Jay Kreps, Confluent | Kafka ♥ Cloud
  • 90. Phase 2 - MQTT Evaluation
  • 91. Phase 2 - MQTT Evaluation
  • 92. Phase 2 - MQTT Evaluation
  • 93. Phase 2 - MQTT Evaluation
  • 94. Phase 2 - MQTT Evaluation
  • 95. Phase 2 - MQTT Evaluation
  • 96. Phase 2 - MQTT Evaluation
  • 97. Phase 2 - MQTT Evaluation
  • 99. Phase 2 - MQTT Evaluation
  • 100. Phase 2 - MQTT Evaluation
  • 101. Phase 2 - MQTT Evaluation
  • 102. Phase 2 - MQTT Evaluation
  • 103. Phase 2 - MQTT Evaluation
  • 105. Phase 2 - MQTT Evaluation
  • 106. Phase 2 - MQTT Evaluation
  • 107. Learn Kafka. Start building with Apache Kafka at Confluent Developer. developer.confluent.io