SlideShare a Scribd company logo
1
Apache Kafka an Open Source
Event Streaming Platform
Erfassung, Analyse und Auswertung von Datenströmen in Echtzeit
22
Introduction
Event Streaming
3
ETL/Data Integration Messaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence After
Consumption
No Replay
Highly Scalable
Durable
Persistent
Ordered
Real-time
44
Highly Scalable
Persistent
ETL/Data Integration MessagingETL/Data Integration MessagingMessaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence After
Consumption
No Replay
Real-timeHighly Scalable
Durable
Persistent
Ordered
Real-time
Event Streaming
55
Highly Scalable
Durable
Persistent
Maintains Order
ETL/Data Integration MessagingETL/Data Integration MessagingMessaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence After
Consumption
No Replay
Fast (Low Latency)Highly Scalable
Durable
Persistent
Ordered
Real-time
Event Streaming
What happened
in the world
(stored records)
What is happening
in the world
(transient messages)
What is contextually happening in the world (data
as a continually updating stream of events)
66
Event-Driven App
(Location Tracking)
Only Real-Time Events
Messaging Queues and
Event Streaming
Platforms can do this
Contextual
Event-Driven App
(ETA)
Real-Time combined
with stored data
Only Event Streaming
Platforms can do this
Where is my driver? When will my driver
get here?
Where is my driver? When will my driver
get here?
2
min
Why Combine Real-time
With Historical Context?
77
Event Streaming Paradigm
Highly Scalable
Durable
Persistent
Maintains Order
Fast (Low Latency)
Event Streaming
88
STREAM
PROCESSING
Create and store
materialized views
Filter
Analyze in-flight
9C O N F I D E N T I A L
Apache Kafka, the de-facto OSS standard for
event streaming
Real-time | Uses disk structure for constant performance at Petabyte scale
Scalable | Distributed, scales quickly and easily without downtime
Persistent | Persists messages on disks, enables intra-cluster replication
Reliable | Replicates data, auto balances consumers upon failure
In production at more
than a third of the
Fortune 500
2 trillion messages a
day at LinkedIn
500 billion events a
day (1.3 PB) at Netflix
10C O N F I D E N T I A L 10C O N F I D E N T I A L
About Confluent We Are The Kafka Experts
30% of Fortune 100
Confluent founders
created Kafka
Confluent team wrote
80% of Kafka
We have over 300,000
hours of Kafka Experience
11C O N F I D E N T I A L
Kafka Integration Architecture
PRODUCERCONSUMER
12C O N F I D E N T I A L
Kafka Cluster
Connect API Stream Processing Connect API
$ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
Stream Processing Analogy
13C O N F I D E N T I A L
KSQLis the
Streaming
SQL Enginefor
Apache Kafka
14C O N F I D E N T I A L
CREATE STREAM ATM_POSSIBLE_FRAUD_ENRICHED AS
SELECT t.account_id,
a.first_name + ’ ’ + a.last_name cust_name,
t.atm, t.amount,
TIMESTAMPTOSTRING(t.ROWTIME,’HH:mm:ss’) tx_time
FROM atm_txns t
INNER JOIN accounts a
ON t.account_id = a.account_id;
Simple SQL syntax for expressing reasoning along and across data streams.
You can write user-defined functions in Java
Stream processing with KSQL
15C O N F I D E N T I A L
KSQL in Development and Production
Interactive KSQL
for development and testing
Headless KSQL
for Production
Desired KSQL queries
have been identified
REST
“Hmm, let me try
out this idea...”
16C O N F I D E N T I A L
ATM Fraud Dataflow: Streaming ETL with KSQL
17C O N F I D E N T I A L
What does KSQL look like?
● First load a topic into a stream
CREATE STREAM ATM_TXNS_GESS (account_id VARCHAR,
atm VARCHAR,
location STRUCT<lon DOUBLE, lat DOUBLE>,
amount INT,
timestamp VARCHAR,
transaction_id VARCHAR)
WITH (KAFKA_TOPIC='atm_txns_gess', VALUE_FORMAT='JSON‘,
TIMESTAMP='timestamp‘,
TIMESTAMP_FORMAT='yyyy-MM-dd HH:mm:ss X‘);
18C O N F I D E N T I A L
What does KSQL look like?
● Create a table on topic for reference data
● Join stream to table for enrichment
CREATE STREAM ATM_POSSIBLE_FRAUD_ENRICHED AS
SELECT T.ACCOUNT_ID AS ACCOUNT_ID, T.TX1_TIMESTAMP,
T.TX2_TIMESTAMP, T.TX1_AMOUNT, T.TX2_AMOUNT,
T.TX1_ATM, T.TX2_ATM, T.TX1_LOCATION, T.TX2_LOCATION,
T.TX1_TRANSACTION_ID, T.TX2_TRANSACTION_ID,
T.DISTANCE_BETWEEN_TXN_KM, T.MILLISECONDS_DIFFERENCE,
T.MINUTES_DIFFERENCE, T.KMH_REQUIRED,
A.FIRST_NAME + ' ‚ + A.LAST_NAME AS CUSTOMER_NAME,
A.EMAIL AS CUSTOMER_EMAIL, A.PHONE AS CUSTOMER_PHONE,
A.ADDRESS AS CUSTOMER_ADDRESS, A.COUNTRY AS CUSTOMER_COUNTRY
FROM ATM_POSSIBLE_FRAUD T
INNER JOIN ACCOUNTS A
ON T.ACCOUNT_ID = A.ACCOUNT_ID;
CREATE TABLE ACCOUNTS
WITH (KAFKA_TOPIC='ACCOUNTS',VALUE_FORMAT='AVRO',KEY='ACCOUNT_ID');
1919
Demo!
20C O N F I D E N T I A L
Or use the Kafka Streams API
● Java or Scala
● Can do multiple joins in one operation
● Provides an interactive query API which makes it possible to query the state
store.
ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Confluent Hub
hub.confluent.io
One-stop place to discover and download :
• Connectors
• Transformations
• Converters
22
Realtime Operations View & Analysis
23Confluent Community - What next?
About 10,000 Kafkateers are
collaborating every single day on the
Confluent Community Slack channel!
There are more than 35,000 Kafkateers
in around 145 meetup groups across all
five continents!
Join the Confluent Community
Slack Channel
Join your local Apache Kafka®
Meetup
Get frequent updates from key names in
Apache Kafka® on best practices,
product updates & more!
Subscribe to the
Confluent blog
cnfl.io/community-slack cnfl.io/meetups cnfl.io/read
Apache, Apache Kafka, Kafka and the Kafka logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no
affiliation with and does not endorse the materials provided at this event.
24
NOMINATE YOURSELF OR A PEER AT
CONFLUENT.IO/NOMINATE
25
KS19Meetup.
CONFLUENT COMMUNITY DISCOUNT CODE
25% OFF*
*Standard Priced Conference pass
ATM Fraud Detection with Apache Kafka and KSQL
@rmoff

More Related Content

PDF
Real-time processing of large amounts of data
confluent
 
PDF
How to Build an Apache Kafka® Connector
confluent
 
PDF
Architecting Microservices Applications with Instant Analytics
confluent
 
PDF
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
confluent
 
PDF
Time series-analysis-using-an-event-streaming-platform -_v3_final
confluent
 
PDF
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
PDF
KSQL: Open Source Streaming for Apache Kafka
confluent
 
PDF
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
confluent
 
Real-time processing of large amounts of data
confluent
 
How to Build an Apache Kafka® Connector
confluent
 
Architecting Microservices Applications with Instant Analytics
confluent
 
Dissolving the Problem (Making an ACID-Compliant Database Out of Apache Kafka®)
confluent
 
Time series-analysis-using-an-event-streaming-platform -_v3_final
confluent
 
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
KSQL: Open Source Streaming for Apache Kafka
confluent
 
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
confluent
 

What's hot (20)

PDF
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
confluent
 
PDF
The State of Stream Processing
confluent
 
PDF
Building a Streaming Platform with Kafka
confluent
 
PPTX
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
apidays
 
PDF
ksqlDB: Building Consciousness on Real Time Events
confluent
 
PDF
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
confluent
 
PDF
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
HostedbyConfluent
 
PPTX
HPBigData2015 PSTL kafka spark vertica
Jack Gudenkauf
 
PDF
Evolving from Messaging to Event Streaming
confluent
 
PDF
Introduction to Stream Processing
Guido Schmutz
 
PDF
What every software engineer should know about streams and tables in kafka ...
confluent
 
PDF
A Tour of Apache Kafka
confluent
 
PDF
Concepts and Patterns for Streaming Services with Kafka
QAware GmbH
 
PDF
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
confluent
 
PDF
Benefits of Stream Processing and Apache Kafka Use Cases
confluent
 
PDF
Leveraging Microservice Architectures & Event-Driven Systems for Global APIs
confluent
 
PDF
Shared time-series-analysis-using-an-event-streaming-platform -_v2
confluent
 
PDF
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
confluent
 
PDF
KSQL – An Open Source Streaming Engine for Apache Kafka
Kai Wähner
 
PDF
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
confluent
 
The State of Stream Processing
confluent
 
Building a Streaming Platform with Kafka
confluent
 
apidays LIVE Singapore 2021 - REST the Events - REST APIs for Event-Driven Ar...
apidays
 
ksqlDB: Building Consciousness on Real Time Events
confluent
 
The Art of The Event Streaming Application: Streams, Stream Processors and Sc...
confluent
 
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
HostedbyConfluent
 
HPBigData2015 PSTL kafka spark vertica
Jack Gudenkauf
 
Evolving from Messaging to Event Streaming
confluent
 
Introduction to Stream Processing
Guido Schmutz
 
What every software engineer should know about streams and tables in kafka ...
confluent
 
A Tour of Apache Kafka
confluent
 
Concepts and Patterns for Streaming Services with Kafka
QAware GmbH
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
confluent
 
Benefits of Stream Processing and Apache Kafka Use Cases
confluent
 
Leveraging Microservice Architectures & Event-Driven Systems for Global APIs
confluent
 
Shared time-series-analysis-using-an-event-streaming-platform -_v2
confluent
 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
confluent
 
KSQL – An Open Source Streaming Engine for Apache Kafka
Kai Wähner
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
Ad

Similar to Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190708v01 (20)

PDF
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Michael Noll
 
PDF
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
confluent
 
PPTX
Real Time Stream Processing with KSQL and Kafka
David Peterson
 
PDF
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Michael Noll
 
PDF
Real-Time Stream Processing with KSQL and Apache Kafka
confluent
 
PPTX
Introduction to KSQL: Streaming SQL for Apache Kafka®
confluent
 
PDF
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
confluent
 
PDF
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Matt Stubbs
 
PPTX
KSQL and Kafka Streams – When to Use Which, and When to Use Both
confluent
 
PDF
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Kai Wähner
 
PDF
Live Coding a KSQL Application
confluent
 
PDF
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
confluent
 
PDF
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Codemotion
 
PDF
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Codemotion
 
PDF
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
Kai Wähner
 
PDF
Chti jug - 2018-06-26
Florent Ramiere
 
PPTX
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Kinetica
 
PDF
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
Kai Wähner
 
PDF
Streaming ETL to Elastic with Apache Kafka and KSQL
confluent
 
PDF
Stefano Pampaloni, Maria Pina Di Cataldo - Meetup #AperiTech di Roma Apache K...
Codemotion
 
Big, Fast, Easy Data: Distributed Stream Processing for Everyone with KSQL, t...
Michael Noll
 
Build a Bridge to Cloud with Apache Kafka® for Data Analytics Cloud Services
confluent
 
Real Time Stream Processing with KSQL and Kafka
David Peterson
 
Unlocking the world of stream processing with KSQL, the streaming SQL engine ...
Michael Noll
 
Real-Time Stream Processing with KSQL and Apache Kafka
confluent
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
confluent
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
confluent
 
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Matt Stubbs
 
KSQL and Kafka Streams – When to Use Which, and When to Use Both
confluent
 
Apache Kafka as Event-Driven Open Source Streaming Platform (Prague Meetup)
Kai Wähner
 
Live Coding a KSQL Application
confluent
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
confluent
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Codemotion
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Codemotion
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
Kai Wähner
 
Chti jug - 2018-06-26
Florent Ramiere
 
Webinar: Unlock the Power of Streaming Data with Kinetica and Confluent
Kinetica
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
Kai Wähner
 
Streaming ETL to Elastic with Apache Kafka and KSQL
confluent
 
Stefano Pampaloni, Maria Pina Di Cataldo - Meetup #AperiTech di Roma Apache K...
Codemotion
 
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
PDF
Migration, backup and restore made easy using Kannika
confluent
 
PDF
Five Things You Need to Know About Data Streaming in 2025
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
PDF
Unlocking value with event-driven architecture by Confluent
confluent
 
PDF
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
PDF
Building API data products on top of your real-time data infrastructure
confluent
 
PDF
Speed Wins: From Kafka to APIs in Minutes
confluent
 
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 

Recently uploaded (20)

PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Doc9.....................................
SofiaCollazos
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 

Confluent real time_acquisition_analysis_and_evaluation_of_data_streams_20190708v01

  • 1. 1 Apache Kafka an Open Source Event Streaming Platform Erfassung, Analyse und Auswertung von Datenströmen in Echtzeit
  • 3. 3 ETL/Data Integration Messaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Highly Scalable Durable Persistent Ordered Real-time
  • 4. 44 Highly Scalable Persistent ETL/Data Integration MessagingETL/Data Integration MessagingMessaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Real-timeHighly Scalable Durable Persistent Ordered Real-time Event Streaming
  • 5. 55 Highly Scalable Durable Persistent Maintains Order ETL/Data Integration MessagingETL/Data Integration MessagingMessaging Batch Expensive Time Consuming Difficult to Scale No Persistence After Consumption No Replay Fast (Low Latency)Highly Scalable Durable Persistent Ordered Real-time Event Streaming What happened in the world (stored records) What is happening in the world (transient messages) What is contextually happening in the world (data as a continually updating stream of events)
  • 6. 66 Event-Driven App (Location Tracking) Only Real-Time Events Messaging Queues and Event Streaming Platforms can do this Contextual Event-Driven App (ETA) Real-Time combined with stored data Only Event Streaming Platforms can do this Where is my driver? When will my driver get here? Where is my driver? When will my driver get here? 2 min Why Combine Real-time With Historical Context?
  • 7. 77 Event Streaming Paradigm Highly Scalable Durable Persistent Maintains Order Fast (Low Latency) Event Streaming
  • 8. 88 STREAM PROCESSING Create and store materialized views Filter Analyze in-flight
  • 9. 9C O N F I D E N T I A L Apache Kafka, the de-facto OSS standard for event streaming Real-time | Uses disk structure for constant performance at Petabyte scale Scalable | Distributed, scales quickly and easily without downtime Persistent | Persists messages on disks, enables intra-cluster replication Reliable | Replicates data, auto balances consumers upon failure In production at more than a third of the Fortune 500 2 trillion messages a day at LinkedIn 500 billion events a day (1.3 PB) at Netflix
  • 10. 10C O N F I D E N T I A L 10C O N F I D E N T I A L About Confluent We Are The Kafka Experts 30% of Fortune 100 Confluent founders created Kafka Confluent team wrote 80% of Kafka We have over 300,000 hours of Kafka Experience
  • 11. 11C O N F I D E N T I A L Kafka Integration Architecture PRODUCERCONSUMER
  • 12. 12C O N F I D E N T I A L Kafka Cluster Connect API Stream Processing Connect API $ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt Stream Processing Analogy
  • 13. 13C O N F I D E N T I A L KSQLis the Streaming SQL Enginefor Apache Kafka
  • 14. 14C O N F I D E N T I A L CREATE STREAM ATM_POSSIBLE_FRAUD_ENRICHED AS SELECT t.account_id, a.first_name + ’ ’ + a.last_name cust_name, t.atm, t.amount, TIMESTAMPTOSTRING(t.ROWTIME,’HH:mm:ss’) tx_time FROM atm_txns t INNER JOIN accounts a ON t.account_id = a.account_id; Simple SQL syntax for expressing reasoning along and across data streams. You can write user-defined functions in Java Stream processing with KSQL
  • 15. 15C O N F I D E N T I A L KSQL in Development and Production Interactive KSQL for development and testing Headless KSQL for Production Desired KSQL queries have been identified REST “Hmm, let me try out this idea...”
  • 16. 16C O N F I D E N T I A L ATM Fraud Dataflow: Streaming ETL with KSQL
  • 17. 17C O N F I D E N T I A L What does KSQL look like? ● First load a topic into a stream CREATE STREAM ATM_TXNS_GESS (account_id VARCHAR, atm VARCHAR, location STRUCT<lon DOUBLE, lat DOUBLE>, amount INT, timestamp VARCHAR, transaction_id VARCHAR) WITH (KAFKA_TOPIC='atm_txns_gess', VALUE_FORMAT='JSON‘, TIMESTAMP='timestamp‘, TIMESTAMP_FORMAT='yyyy-MM-dd HH:mm:ss X‘);
  • 18. 18C O N F I D E N T I A L What does KSQL look like? ● Create a table on topic for reference data ● Join stream to table for enrichment CREATE STREAM ATM_POSSIBLE_FRAUD_ENRICHED AS SELECT T.ACCOUNT_ID AS ACCOUNT_ID, T.TX1_TIMESTAMP, T.TX2_TIMESTAMP, T.TX1_AMOUNT, T.TX2_AMOUNT, T.TX1_ATM, T.TX2_ATM, T.TX1_LOCATION, T.TX2_LOCATION, T.TX1_TRANSACTION_ID, T.TX2_TRANSACTION_ID, T.DISTANCE_BETWEEN_TXN_KM, T.MILLISECONDS_DIFFERENCE, T.MINUTES_DIFFERENCE, T.KMH_REQUIRED, A.FIRST_NAME + ' ‚ + A.LAST_NAME AS CUSTOMER_NAME, A.EMAIL AS CUSTOMER_EMAIL, A.PHONE AS CUSTOMER_PHONE, A.ADDRESS AS CUSTOMER_ADDRESS, A.COUNTRY AS CUSTOMER_COUNTRY FROM ATM_POSSIBLE_FRAUD T INNER JOIN ACCOUNTS A ON T.ACCOUNT_ID = A.ACCOUNT_ID; CREATE TABLE ACCOUNTS WITH (KAFKA_TOPIC='ACCOUNTS',VALUE_FORMAT='AVRO',KEY='ACCOUNT_ID');
  • 20. 20C O N F I D E N T I A L Or use the Kafka Streams API ● Java or Scala ● Can do multiple joins in one operation ● Provides an interactive query API which makes it possible to query the state store.
  • 21. ATM Fraud Detection with Apache Kafka and KSQL @rmoff Confluent Hub hub.confluent.io One-stop place to discover and download : • Connectors • Transformations • Converters
  • 23. 23Confluent Community - What next? About 10,000 Kafkateers are collaborating every single day on the Confluent Community Slack channel! There are more than 35,000 Kafkateers in around 145 meetup groups across all five continents! Join the Confluent Community Slack Channel Join your local Apache Kafka® Meetup Get frequent updates from key names in Apache Kafka® on best practices, product updates & more! Subscribe to the Confluent blog cnfl.io/community-slack cnfl.io/meetups cnfl.io/read Apache, Apache Kafka, Kafka and the Kafka logo are trademarks of the Apache Software Foundation. The Apache Software Foundation has no affiliation with and does not endorse the materials provided at this event.
  • 24. 24 NOMINATE YOURSELF OR A PEER AT CONFLUENT.IO/NOMINATE
  • 25. 25 KS19Meetup. CONFLUENT COMMUNITY DISCOUNT CODE 25% OFF* *Standard Priced Conference pass
  • 26. ATM Fraud Detection with Apache Kafka and KSQL @rmoff