SlideShare a Scribd company logo
1
Simplified Machine Learning Architecture
with an Event Streaming Platform
Kai Waehner | Technology Evangelist, Confluent
contact@kai-waehner.de | LinkedIn | @KaiWaehner | www.confluent.io | www.kai-waehner.de
2
Machine Learning to Improve Traditional
and to Build New Use Cases
Seconds Minutes Hours
Windows of Opportunity
Real Time
Tracking
Predictive
Maintenance
Fraud
Detection
Cross Selling
Transportation
Rerouting
Customer
Service
Inventory
Management
Autonomous
Driving
Face
Recognition
Robotics
Speech
Translation
Video
Generation
Supply Chain
Optimization
Strategic
Planning
3
Global Automotive Company
Builds Connected Car Infrastructure
Digital Transformation
• Improve customer experience
• Increase revenue
• Reduce risk
Time
Today 2 years in the future3 years ago
Project begins Connected car infrastructure
in production for first use cases
Improved processes leveraging
machine learning (predictive
maintenance, cross-selling)
4
Streaming Analytics for
Predictive Maintenance at Scale
IoT
Integration
Layer
Batch
Analytics
Platform
BI
Dashboard
Streaming
Platform
Big Data
Integration
Layer
Car Sensors
Streaming Platform
Other Components
Real Time
Monitoring
System
All
Data
Critical
Data
Ingest
Data
Human
Intelligence
5
Machine Learning (ML)
...allows computers to find hidden insights without
being explicitly programmed where to look.
Machine
Learning
• Decision Trees
• Naïve Bayes
• Clustering
• Neural Networks
• Etc.
Deep
Learning
• CNN
• RNN
• Transformer
• Autoencoder
• Etc.
6
Streaming Analytics for
Predictive Maintenance at Scale
IoT
Integration
Layer
Batch
Analytics
Platform
BI
Dashboard
Streaming
Platform
Big Data
Integration
Layer
Car Sensors
Streaming Platform
Analytics Platform
Other Components
Real Time
Monitoring
System
All
Data
Critical
Data
Ingest
Data
Potential
Detect
Data
Processing
Analytics
Platform
Train Analytic
Model
Consume
Data
Preprocess
Data
Analytic Model
Deploy Analytic
Model
7
The First
Analytic Models
How to deploy the models
in production?
…real-time processing?
…at scale?
…24/7 zero uptime?
8
Hidden Technical Debt
in Machine Learning Systems
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
9
Scalable,
Technology-Agnostic
Machine Learning
Infrastructures
https://www.infoq.com/presentations/netflix-ml-meson
https://eng.uber.com/michelangelo
https://www.infoq.com/presentations/paypal-data-service-fraud
10
Event Streaming Platform –
The Commit Log
Time
P
C1 C2
C3
11
Event Streaming Platform –
A Distributed System
Broker 1
Topic1
partition1
Broker 2 Broker 3 Broker 4
Topic1
partition1
Topic1
partition1
Leader Follower
Topic1
partition2
Topic1
partition2
Topic1
partition2
Topic1
partition3
Topic1
partition4
Topic1
partition3
Topic1
partition3
Topic1
partition4
Topic1
partition4
12
A Streaming Platform
is the Underpinning of an Event-driven Architecture
Microservices
DBs
SaaS apps
Mobile
Customer 360
Real-time fraud
detection
Data warehouse
Producers
Consumers
Database
change
Microservices
events
SaaS
data
Customer
experiences
Streams of real time events
Stream processing apps
Connectors
Connectors
Stream processing apps
13
Apache Kafka at Scale
at Tech Giants
> 4.5 trillion messages / day > 6 Petabytes / day
“You name it”
* Kafka Is not just used by tech giants
** Kafka is not just used for big data
14Business Value per Use Case
Business
Value
Improve
Customer
Experience
(CX)
Increase
Revenue
(make money)
Decrease
Costs
(save money)
Core Business
Platform
Increase
Operational
Efficiency
Migrate to
Cloud
Mitigate Risk
(protect money)
Key Drivers
Strategic Objectives
(sample)
Fraud
Detection
IoT sensor
ingestion
Digital
replatforming/
Mainframe Offload
Connected Car: Navigation & improved in-
car experience: Audi
Customer 360
Simplifying Omni-channel Retail at Scale:
Target
Faster transactional
processing / analysis
incl. Machine Learning / AI
Mainframe Offload: RBC
Microservices
Architecture
Online Fraud Detection
Online Security
(syslog, log aggregation,
Splunk replacement)
Middleware
replacement
Regulatory
Digital
Transformation
Application Modernization: Multiple
Examples
Website / Core
Operations
(Central Nervous System)
The [Silicon Valley] Digital Natives;
LinkedIn, Netflix, Uber, Yelp...
Predictive Maintenance: Audi
Streaming Platform in a regulated
environment (e.g. Electronic Medical
Records): Celmatix
Real-time app
updates
Real Time Streaming Platform for
Communications and Beyond: Capital One
Developer Velocity - Building Stateful
Financial Applications with Kafka Streams:
Funding Circle
Detect Fraud & Prevent Fraud in Real
Time: PayPal
Kafka as a Service - A Tale of Security and
Multi-Tenancy: Apple
Example Use Cases
$↑
$↓
$↔
Example Case Studies
(of many)
15
Apache Kafka’s
Open Ecosystem as Infrastructure for ML
16
Apache Kafka’s
Open Ecosystem as Infrastructure for ML
Kafka
Streams /
KSQL
Kafka
Connect
Rest Proxy
Schema Registry
Go/.NET /Python
Kafka Producer
KSQL
Kafka
Streams
17
Ingestion of
IoT Data
Replication
MirrorMaker /
Confluent Replicator
Kafka Connect
Analytics /
Machine
Learning
Cars
Cars
Cars
Cars
Cars
18
Data
Preprocessing
Preprocessing
Filter, transform, anonymize, extract features
Streams
Data Ready
For Model Training
19
SELECT car_id, event_id, car_model_id, sensor_input
FROM car_sensor c
LEFT JOIN car_models m ON c.car_model_id =
m.car_model_id
WHERE m.car_model_type ='Audi_A8';
Preprocessing
with KSQL
20
Data Ingestion into a Data Store for Model Training
(and Consumption by other Decoupled Applications)
Connect
Preprocessed
Data
Batch Near Real Time Real Time
21
Extreme scale using
TensorFlow and
TPUs in the cloud!
Analytic Model
Model Training
Using an Elastic
Infrastructure in
the Cloud
22
TensorFlow Model —
Autoencoder for Anomaly Detection
23
Direct streaming ingestion
for model training
with TensorFlow I/O + Kafka Plugin
(no additional data storage
like S3 or HDFS required!)
Time
Model BModel A
Producer
Distributed Commit Log
Streaming Ingestion and Model Training
with TensorFlow IO
https://github.com/tensorflow/io
24
Local Predictions
Model Training
in Cloud
Model Deployment
at the Edge
Analytic Model
Separation of
Model Training and Model Inference
25
Streams
Input Event
Prediction
Request
Response
Model Serving
TensorFlow Serving
gRPC / HTTP
Application
Stream Processing with External Model and RPC
26
Prediction
Stream Processing
Model
doPrediction()
return value
Stream Processing
with Embedded Model
Streams
Input Event
27
“CREATE STREAM AnomalyDetection AS
SELECT sensor_id, detectAnomaly(sensor_values)
FROM car_engine;“
User Defined Function (UDF)
Model Deployment with
Apache Kafka, KSQL
and TensorFlow
28
Streaming Analytics with
Kafka and TensorFlow
MQTT
Proxy
Elastic
Search
Grafana
Kafka
Cluster
Kafka
Connect
Car Sensors
Kafka Ecosystem
TensorFlow
Other Components
Kafka
Streams
Application
All
Data
Critical
Data
Ingest
Data
Potential
Detect
KSQL
TensorFlow
Train
Analytic Model
Consume
Data
Preprocess
Data
Analytic Model
Deploy Analytic
Model
29
Demo 100.000 Connected Cars
(Kafka + KSQL + MQTT + TensorFlow)
https://github.com/kaiwaehner/hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference
3030
Machine Learning + Apache Kafka
à Examples @ Github
https://github.com/kaiwaehner
31
Key Takeaways
Don’t underestimate the
Hidden Technical Debt
in Machine Learning
Systems
Leverage the Apache
Kafka Open Source
Ecosystem as scalable
and flexible Event
Streaming Platform
Use Streaming Machine
Learning with Kafka
and TensorFlow IO to
simplify your Big Data
Architecture
3232
11. November 2019
Steigenberger Frankfurter Hof
13. November 2019
NOVOTEL Zürich City West
Ben Stopford
Office of the CTO
Confluent
Axel Löhn
Senior Project Manager
Deutsche Bahn
Kai Waehner,
Technologist
Confluent
Ralph Debusmann
IoT Solution Architect
Bosch Power Tools
cnfl.io/cse19frankfurt cnfl.io/cse19zurich
33
Questions?
Feedback?
Let’s Connect!
Kai Waehner | Technology Evangelist
●contact@kai-waehner.de
●@KaiWaehner
●www.kai-waehner.de
●www.confluent.io
●LinkedIn

More Related Content

What's hot (20)

PDF
Apache Kafka® Use Cases for Financial Services
confluent
 
PDF
Mainframe Integration, Offloading and Replacement with Apache Kafka
Kai Wähner
 
PDF
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
Kai Wähner
 
PDF
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
 
PPTX
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
PPTX
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
DataWorks Summit
 
PPTX
Kafka 101
Clement Demonchy
 
PDF
Kafka Streams: What it is, and how to use it?
confluent
 
PPTX
Data Lake Overview
James Serra
 
PDF
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
HostedbyConfluent
 
PDF
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Paris Data Engineers !
 
PDF
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Zalando Technology
 
PDF
Apache Kafka Introduction
Amita Mirajkar
 
PDF
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
confluent
 
PDF
Microservices
Trieu Nguyen
 
PDF
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
PDF
Delta from a Data Engineer's Perspective
Databricks
 
PDF
What is new in Apache Hive 3.0?
DataWorks Summit
 
PPTX
Microservices Part 3 Service Mesh and Kafka
Araf Karsh Hamid
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Apache Kafka® Use Cases for Financial Services
confluent
 
Mainframe Integration, Offloading and Replacement with Apache Kafka
Kai Wähner
 
The Heart of the Data Mesh Beats in Real-Time with Apache Kafka
Kai Wähner
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
 
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
HDFS Erasure Code Storage - Same Reliability at Better Storage Efficiency
DataWorks Summit
 
Kafka 101
Clement Demonchy
 
Kafka Streams: What it is, and how to use it?
confluent
 
Data Lake Overview
James Serra
 
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
HostedbyConfluent
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Paris Data Engineers !
 
Stream Processing using Apache Flink in Zalando's World of Microservices - Re...
Zalando Technology
 
Apache Kafka Introduction
Amita Mirajkar
 
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
confluent
 
Microservices
Trieu Nguyen
 
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye
 
Delta from a Data Engineer's Perspective
Databricks
 
What is new in Apache Hive 3.0?
DataWorks Summit
 
Microservices Part 3 Service Mesh and Kafka
Araf Karsh Hamid
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 

Similar to Simplified Machine Learning Architecture with an Event Streaming Platform (Apache Kafka + TensorFlow I/O) (20)

PDF
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
Nitin Kumar
 
PDF
Unleashing Apache Kafka and TensorFlow in the Cloud

Kai Wähner
 
PDF
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
Kai Wähner
 
PDF
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
Kai Wähner
 
PDF
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
Kai Wähner
 
PDF
Kai Waehner - Deep Learning at Extreme Scale in the Cloud with Apache Kafka a...
Codemotion
 
PDF
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
confluent
 
PDF
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Kai Wähner
 
PDF
Apache Kafka Streams + Machine Learning / Deep Learning
Kai Wähner
 
PDF
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
confluent
 
PDF
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai Wähner
 
PDF
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
confluent
 
PDF
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
confluent
 
PDF
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
Kai Wähner
 
PDF
Real-time processing of large amounts of data
confluent
 
PPTX
Apache Kafka® + Machine Learning for Supply Chain 
confluent
 
PPTX
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
Kai Wähner
 
PDF
Apache kafka event_streaming___kai_waehner
confluent
 
PDF
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
Kai Wähner
 
PDF
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
Kai Wähner
 
2019 04 seattle_meetup___kafka_machine_learning___kai_waehner
Nitin Kumar
 
Unleashing Apache Kafka and TensorFlow in the Cloud

Kai Wähner
 
IoT Sensor Analytics with Kafka, ksqlDB and TensorFlow
Kai Wähner
 
Streaming Machine Learning with Python, Jupyter, TensorFlow, Apache Kafka and...
Kai Wähner
 
Unleashing Apache Kafka and TensorFlow in Hybrid Cloud Architectures
Kai Wähner
 
Kai Waehner - Deep Learning at Extreme Scale in the Cloud with Apache Kafka a...
Codemotion
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
confluent
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Kai Wähner
 
Apache Kafka Streams + Machine Learning / Deep Learning
Kai Wähner
 
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Str...
confluent
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai Wähner
 
Event-Driven Model Serving: Stream Processing vs. RPC with Kafka and TensorFl...
confluent
 
Fast Data – Fast Cars: Wie Apache Kafka die Datenwelt revolutioniert
confluent
 
IoT Sensor Analytics with Python, Jupyter, TensorFlow, Keras, Apache Kafka, K...
Kai Wähner
 
Real-time processing of large amounts of data
confluent
 
Apache Kafka® + Machine Learning for Supply Chain 
confluent
 
IIoT with Kafka and Machine Learning for Supply Chain Optimization In Real Ti...
Kai Wähner
 
Apache kafka event_streaming___kai_waehner
confluent
 
The Rise Of Event Streaming – Why Apache Kafka Changes Everything
Kai Wähner
 
Apache Kafka Open Source Ecosystem for Machine Learning at Extreme Scale (Apa...
Kai Wähner
 
Ad

More from Kai Wähner (20)

PDF
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Kai Wähner
 
PDF
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kai Wähner
 
PDF
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Kai Wähner
 
PDF
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Kai Wähner
 
PDF
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
 
PDF
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Kai Wähner
 
PDF
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Kai Wähner
 
PDF
Apache Kafka in the Healthcare Industry
Kai Wähner
 
PDF
Apache Kafka in the Healthcare Industry
Kai Wähner
 
PDF
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Kai Wähner
 
PDF
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kai Wähner
 
PDF
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Kai Wähner
 
PDF
Apache Kafka Landscape for Automotive and Manufacturing
Kai Wähner
 
PDF
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
PPTX
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 
PDF
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Kai Wähner
 
PDF
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Kai Wähner
 
PDF
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Kai Wähner
 
PDF
Apache Kafka in the Transportation and Logistics
Kai Wähner
 
PDF
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
Kai Wähner
 
Apache Kafka as Data Hub for Crypto, NFT, Metaverse (Beyond the Buzz!)
Kai Wähner
 
Kafka for Live Commerce to Transform the Retail and Shopping Metaverse
Kai Wähner
 
Apache Kafka vs. Cloud-native iPaaS Integration Platform Middleware
Kai Wähner
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Kai Wähner
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
 
Resilient Real-time Data Streaming across the Edge and Hybrid Cloud with Apac...
Kai Wähner
 
Data Streaming with Apache Kafka in the Defence and Cybersecurity Industry
Kai Wähner
 
Apache Kafka in the Healthcare Industry
Kai Wähner
 
Apache Kafka in the Healthcare Industry
Kai Wähner
 
Apache Kafka for Real-time Supply Chain in the Food and Retail Industry
Kai Wähner
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kai Wähner
 
Apache Kafka for Predictive Maintenance in Industrial IoT / Industry 4.0
Kai Wähner
 
Apache Kafka Landscape for Automotive and Manufacturing
Kai Wähner
 
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
Kai Wähner
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Kai Wähner
 
Apache Kafka in the Public Sector (Government, National Security, Citizen Ser...
Kai Wähner
 
Telco 4.0 - Payment and FinServ Integration for Data in Motion with 5G and Ap...
Kai Wähner
 
Apache Kafka in the Transportation and Logistics
Kai Wähner
 
Apache Kafka for Cybersecurity and SIEM / SOAR Modernization
Kai Wähner
 
Ad

Recently uploaded (20)

PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PPTX
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PPTX
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PDF
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 

Simplified Machine Learning Architecture with an Event Streaming Platform (Apache Kafka + TensorFlow I/O)

  • 1. 1 Simplified Machine Learning Architecture with an Event Streaming Platform Kai Waehner | Technology Evangelist, Confluent [email protected] | LinkedIn | @KaiWaehner | www.confluent.io | www.kai-waehner.de
  • 2. 2 Machine Learning to Improve Traditional and to Build New Use Cases Seconds Minutes Hours Windows of Opportunity Real Time Tracking Predictive Maintenance Fraud Detection Cross Selling Transportation Rerouting Customer Service Inventory Management Autonomous Driving Face Recognition Robotics Speech Translation Video Generation Supply Chain Optimization Strategic Planning
  • 3. 3 Global Automotive Company Builds Connected Car Infrastructure Digital Transformation • Improve customer experience • Increase revenue • Reduce risk Time Today 2 years in the future3 years ago Project begins Connected car infrastructure in production for first use cases Improved processes leveraging machine learning (predictive maintenance, cross-selling)
  • 4. 4 Streaming Analytics for Predictive Maintenance at Scale IoT Integration Layer Batch Analytics Platform BI Dashboard Streaming Platform Big Data Integration Layer Car Sensors Streaming Platform Other Components Real Time Monitoring System All Data Critical Data Ingest Data Human Intelligence
  • 5. 5 Machine Learning (ML) ...allows computers to find hidden insights without being explicitly programmed where to look. Machine Learning • Decision Trees • Naïve Bayes • Clustering • Neural Networks • Etc. Deep Learning • CNN • RNN • Transformer • Autoencoder • Etc.
  • 6. 6 Streaming Analytics for Predictive Maintenance at Scale IoT Integration Layer Batch Analytics Platform BI Dashboard Streaming Platform Big Data Integration Layer Car Sensors Streaming Platform Analytics Platform Other Components Real Time Monitoring System All Data Critical Data Ingest Data Potential Detect Data Processing Analytics Platform Train Analytic Model Consume Data Preprocess Data Analytic Model Deploy Analytic Model
  • 7. 7 The First Analytic Models How to deploy the models in production? …real-time processing? …at scale? …24/7 zero uptime?
  • 8. 8 Hidden Technical Debt in Machine Learning Systems https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
  • 10. 10 Event Streaming Platform – The Commit Log Time P C1 C2 C3
  • 11. 11 Event Streaming Platform – A Distributed System Broker 1 Topic1 partition1 Broker 2 Broker 3 Broker 4 Topic1 partition1 Topic1 partition1 Leader Follower Topic1 partition2 Topic1 partition2 Topic1 partition2 Topic1 partition3 Topic1 partition4 Topic1 partition3 Topic1 partition3 Topic1 partition4 Topic1 partition4
  • 12. 12 A Streaming Platform is the Underpinning of an Event-driven Architecture Microservices DBs SaaS apps Mobile Customer 360 Real-time fraud detection Data warehouse Producers Consumers Database change Microservices events SaaS data Customer experiences Streams of real time events Stream processing apps Connectors Connectors Stream processing apps
  • 13. 13 Apache Kafka at Scale at Tech Giants > 4.5 trillion messages / day > 6 Petabytes / day “You name it” * Kafka Is not just used by tech giants ** Kafka is not just used for big data
  • 14. 14Business Value per Use Case Business Value Improve Customer Experience (CX) Increase Revenue (make money) Decrease Costs (save money) Core Business Platform Increase Operational Efficiency Migrate to Cloud Mitigate Risk (protect money) Key Drivers Strategic Objectives (sample) Fraud Detection IoT sensor ingestion Digital replatforming/ Mainframe Offload Connected Car: Navigation & improved in- car experience: Audi Customer 360 Simplifying Omni-channel Retail at Scale: Target Faster transactional processing / analysis incl. Machine Learning / AI Mainframe Offload: RBC Microservices Architecture Online Fraud Detection Online Security (syslog, log aggregation, Splunk replacement) Middleware replacement Regulatory Digital Transformation Application Modernization: Multiple Examples Website / Core Operations (Central Nervous System) The [Silicon Valley] Digital Natives; LinkedIn, Netflix, Uber, Yelp... Predictive Maintenance: Audi Streaming Platform in a regulated environment (e.g. Electronic Medical Records): Celmatix Real-time app updates Real Time Streaming Platform for Communications and Beyond: Capital One Developer Velocity - Building Stateful Financial Applications with Kafka Streams: Funding Circle Detect Fraud & Prevent Fraud in Real Time: PayPal Kafka as a Service - A Tale of Security and Multi-Tenancy: Apple Example Use Cases $↑ $↓ $↔ Example Case Studies (of many)
  • 15. 15 Apache Kafka’s Open Ecosystem as Infrastructure for ML
  • 16. 16 Apache Kafka’s Open Ecosystem as Infrastructure for ML Kafka Streams / KSQL Kafka Connect Rest Proxy Schema Registry Go/.NET /Python Kafka Producer KSQL Kafka Streams
  • 17. 17 Ingestion of IoT Data Replication MirrorMaker / Confluent Replicator Kafka Connect Analytics / Machine Learning Cars Cars Cars Cars Cars
  • 18. 18 Data Preprocessing Preprocessing Filter, transform, anonymize, extract features Streams Data Ready For Model Training
  • 19. 19 SELECT car_id, event_id, car_model_id, sensor_input FROM car_sensor c LEFT JOIN car_models m ON c.car_model_id = m.car_model_id WHERE m.car_model_type ='Audi_A8'; Preprocessing with KSQL
  • 20. 20 Data Ingestion into a Data Store for Model Training (and Consumption by other Decoupled Applications) Connect Preprocessed Data Batch Near Real Time Real Time
  • 21. 21 Extreme scale using TensorFlow and TPUs in the cloud! Analytic Model Model Training Using an Elastic Infrastructure in the Cloud
  • 22. 22 TensorFlow Model — Autoencoder for Anomaly Detection
  • 23. 23 Direct streaming ingestion for model training with TensorFlow I/O + Kafka Plugin (no additional data storage like S3 or HDFS required!) Time Model BModel A Producer Distributed Commit Log Streaming Ingestion and Model Training with TensorFlow IO https://github.com/tensorflow/io
  • 24. 24 Local Predictions Model Training in Cloud Model Deployment at the Edge Analytic Model Separation of Model Training and Model Inference
  • 25. 25 Streams Input Event Prediction Request Response Model Serving TensorFlow Serving gRPC / HTTP Application Stream Processing with External Model and RPC
  • 26. 26 Prediction Stream Processing Model doPrediction() return value Stream Processing with Embedded Model Streams Input Event
  • 27. 27 “CREATE STREAM AnomalyDetection AS SELECT sensor_id, detectAnomaly(sensor_values) FROM car_engine;“ User Defined Function (UDF) Model Deployment with Apache Kafka, KSQL and TensorFlow
  • 28. 28 Streaming Analytics with Kafka and TensorFlow MQTT Proxy Elastic Search Grafana Kafka Cluster Kafka Connect Car Sensors Kafka Ecosystem TensorFlow Other Components Kafka Streams Application All Data Critical Data Ingest Data Potential Detect KSQL TensorFlow Train Analytic Model Consume Data Preprocess Data Analytic Model Deploy Analytic Model
  • 29. 29 Demo 100.000 Connected Cars (Kafka + KSQL + MQTT + TensorFlow) https://github.com/kaiwaehner/hivemq-mqtt-tensorflow-kafka-realtime-iot-machine-learning-training-inference
  • 30. 3030 Machine Learning + Apache Kafka à Examples @ Github https://github.com/kaiwaehner
  • 31. 31 Key Takeaways Don’t underestimate the Hidden Technical Debt in Machine Learning Systems Leverage the Apache Kafka Open Source Ecosystem as scalable and flexible Event Streaming Platform Use Streaming Machine Learning with Kafka and TensorFlow IO to simplify your Big Data Architecture
  • 32. 3232 11. November 2019 Steigenberger Frankfurter Hof 13. November 2019 NOVOTEL Zürich City West Ben Stopford Office of the CTO Confluent Axel Löhn Senior Project Manager Deutsche Bahn Kai Waehner, Technologist Confluent Ralph Debusmann IoT Solution Architect Bosch Power Tools cnfl.io/cse19frankfurt cnfl.io/cse19zurich
  • 33. 33 Questions? Feedback? Let’s Connect! Kai Waehner | Technology Evangelist ●[email protected] ●@KaiWaehner ●www.kai-waehner.de ●www.confluent.io ●LinkedIn