SlideShare a Scribd company logo
A Microservices Framework
for
Real time Model Scoring
using
Structured Streaming
Vedant Jain
October 2018
#SAISStreaming1
2
About me
• Solutions Architect @ Databricks
• x-Hortonworks, JPMC
3
What will we talk about?
§ Machine learning
§ Model scoring
§ Microservices
§ Structured Streaming
§ Demo
4
Machine Learning
“…what we want is a machine that can learn from experience.”
Types of Machine Learning
• Unsupervised Learning
X Y Z
0.5 -3.4 9.2
-2.3 2.9 3.2
1.3 2.3 -4.5
• Supervised Learning
Types of Machine Learning
Activity
Biking
Standing
Sitting
Labels
X Y Z
0.5 -3.4 9.2
X Y Z
0.5 -3.4 9.2
-2.3 2.9 3.2
1.3 2.3 -4.5
• Supervised Learning
Types of Machine Learning
Activity
Biking
Standing
Sitting
Continuous Learning
Machine Learning Process
Define the
problem
Connect data
source
Acquire Data
Enrich, Transform
Data
Feature
Selection
Build/Train
Model
Serve/Score
Model
9
What will we talk about?
§ Machine learning
§ Model scoring
§ Microservices
§ Structured Streaming
§ Demo
Model Scoring
Propensity: Likelihood of a user to
commit a certain action
Affinity: How similar are two products or
users etc.
Lead: How closely matched lead is to
target profile
Attrition/Churn: Likelihood of a
customer to drop a service and/or start
using a competitor’s service
Credit: Ability of the user to keep
promise if granted access
Anomaly Detection: Identification of
rare or invalid transaction
11
Model Scoring on “Big Data”
Scale Streaming Dynamic
Machine Learning Pipeline
12
• Drop null values
• Calculate moving
averages on 10 minute
window
• Convert to single
parquet
• OneHotEncoder
• RF Classifier
• Cross Tabulation
• K-Means Clustering
• Hyperparameter Tuning
• Pipelines
• Deploy Pipelines to
Docker
• Serve the models using
Python Flask
• Phone/Watch
• Gyroscope/Accelerometer
• CSV
13
What will we talk about?
§ Machine learning
§ Model scoring
§ Microservices
§ Structured Streaming
§ Demo
Microservices
● Properties:
Ø Decomposition: Logic is broken down into multiple independent components
Ø Isolation: Component services are deployed and maintained independently of one another
● Benefits:
ü Reduced Regression Testing time
ü Organizational Autonomy
ü Cloud/On-premise Agnostic
ü Scalability/Reusability
Microservices
Example:
Company A tracks user’s activity through smart
devices and wants to provide tailored content to the
users based on their behavior.
Microservices
X Y Z Activity
0.5 -3.4 9.2 Biking
-2.3 2.9 3.2 Standing
1.3 2.3 -4.5 Sitting
‘Activity’
score
User Bike Stand Sit Cluster
A 0.1 0.3 0.6 0
B 0.0 0.3 0.7 1
Actionable Insights
Web services
Target
content
Affinity Score
Microservices
Microservices
Isolation
Microservices
Scalability
• Consistent
• Scalable
• Fault tolerant
• Complex Event
Processing
Microservices
21
What will we talk about?
§ Machine learning
§ Model scoring
§ Microservices
§ Structured Streaming
§ Demo
Structured Streaming
• High-level streaming API
built on Spark SQL engine
• Runs the same computation
as batch queries in
Datasets/DataFrames
• Event time, windowing,
sessions, sources & sinks
• End-to-end exactly once
semantics
• Late Data Handling
23
ML Limitations in Streaming
• Many models/transformers/estimators are not supported
• Limited to only models built in Spark MLLib
• Not ideal for Continuous Learning
23
24
What will we talk about?
§ Machine Learning
§ Model scoring
§ Microservices
§ Structured Streaming
§ Demo
https://github.com/vedantja/eu_summit_demo

More Related Content

What's hot (20)

PPTX
AI @ Microsoft, How we do it and how you can too!
Microsoft Tech Community
 
PDF
Building real time analytics applications using pinot : A LinkedIn case study
Kishore Gopalakrishna
 
PDF
Overcoming Today's Data Challenges with MongoDB
MongoDB
 
PDF
eBay Architecture
Tony Ng
 
PDF
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Databricks
 
PDF
백억개의 로그를 모아 검색하고 분석하고 학습도 시켜보자 : 로기스
NAVER D2
 
PDF
Elasticsearch
Shagun Rathore
 
PDF
Introduction to JIRA & Agile Project Management
Dan Chuparkoff
 
PDF
신입 개발자가 스타트업에서 AWS로 살아남는 이야기 - 조용진, 모두의 캠퍼스 :: AWS Summit Seoul 2019
Amazon Web Services Korea
 
PDF
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Databricks
 
PDF
실시간 스트리밍 분석 Kinesis Data Analytics Deep Dive
Amazon Web Services Korea
 
PDF
Scrumban
Xebia Nederland BV
 
PPTX
Elastic search overview
ABC Talks
 
PDF
Introduction to MLflow
Databricks
 
PPTX
Introduction to Storm
Chandler Huang
 
PDF
Cloud-Native Integration with Apache Camel on Kubernetes (Copenhagen October ...
Claus Ibsen
 
PDF
Deploy and Serve Model from Azure Databricks onto Azure Machine Learning
Databricks
 
PDF
Aggregated queries with Druid on terrabytes and petabytes of data
Rostislav Pashuto
 
PDF
Generative AI on Digital Marketing for 2024 - Andrew Chow, Asia Pro Ventures ...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
PPTX
Large Scale Graph Analytics with JanusGraph
DataWorks Summit
 
AI @ Microsoft, How we do it and how you can too!
Microsoft Tech Community
 
Building real time analytics applications using pinot : A LinkedIn case study
Kishore Gopalakrishna
 
Overcoming Today's Data Challenges with MongoDB
MongoDB
 
eBay Architecture
Tony Ng
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Databricks
 
백억개의 로그를 모아 검색하고 분석하고 학습도 시켜보자 : 로기스
NAVER D2
 
Elasticsearch
Shagun Rathore
 
Introduction to JIRA & Agile Project Management
Dan Chuparkoff
 
신입 개발자가 스타트업에서 AWS로 살아남는 이야기 - 조용진, 모두의 캠퍼스 :: AWS Summit Seoul 2019
Amazon Web Services Korea
 
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Databricks
 
실시간 스트리밍 분석 Kinesis Data Analytics Deep Dive
Amazon Web Services Korea
 
Elastic search overview
ABC Talks
 
Introduction to MLflow
Databricks
 
Introduction to Storm
Chandler Huang
 
Cloud-Native Integration with Apache Camel on Kubernetes (Copenhagen October ...
Claus Ibsen
 
Deploy and Serve Model from Azure Databricks onto Azure Machine Learning
Databricks
 
Aggregated queries with Druid on terrabytes and petabytes of data
Rostislav Pashuto
 
Generative AI on Digital Marketing for 2024 - Andrew Chow, Asia Pro Ventures ...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Large Scale Graph Analytics with JanusGraph
DataWorks Summit
 

Similar to A Microservices Framework for Real-Time Model Scoring Using Structured Streaming with Vedant Jain (20)

PPTX
Predictive maintenance withsensors_in_utilities_
Tina Zhang
 
PDF
Five Early Challenges Of Building Streaming Fast Data Applications
Lightbend
 
PPTX
Real time streaming analytics
Anirudh
 
PDF
Databricks Meetup @ Los Angeles Apache Spark User Group
Paco Nathan
 
PDF
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
PDF
QCon São Paulo: Real-Time Analytics with Spark Streaming
Paco Nathan
 
PDF
AI meets Big Data
Jan Wiegelmann
 
PDF
As simple as Apache Spark
Data Science Warsaw
 
PDF
Bds session 13 14
Infinity Tech Solutions
 
PDF
Spark and machine learning in microservices architecture
Stepan Pushkarev
 
PDF
Machine learning at scale challenges and solutions
Stavros Kontopoulos
 
PDF
Strata EU 2014: Spark Streaming Case Studies
Paco Nathan
 
PPTX
Design Patterns for Large-Scale Real-Time Learning
Swiss Big Data User Group
 
PDF
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Lightbend
 
PDF
Fast Data at ING – the why, what and how of the streaming analytics platform ...
Bas Geerdink
 
PDF
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Anyscale
 
PDF
ML Model Serving at Twitter
Zhiyong (Joe) Xie
 
PDF
DevOps for DataScience
Stepan Pushkarev
 
PDF
Modelling and Querying Sensor Services using Ontologies
Wassim Derguech
 
PDF
Streamsets and spark in Retail
Hari Shreedharan
 
Predictive maintenance withsensors_in_utilities_
Tina Zhang
 
Five Early Challenges Of Building Streaming Fast Data Applications
Lightbend
 
Real time streaming analytics
Anirudh
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Paco Nathan
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
Paco Nathan
 
AI meets Big Data
Jan Wiegelmann
 
As simple as Apache Spark
Data Science Warsaw
 
Bds session 13 14
Infinity Tech Solutions
 
Spark and machine learning in microservices architecture
Stepan Pushkarev
 
Machine learning at scale challenges and solutions
Stavros Kontopoulos
 
Strata EU 2014: Spark Streaming Case Studies
Paco Nathan
 
Design Patterns for Large-Scale Real-Time Learning
Swiss Big Data User Group
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Lightbend
 
Fast Data at ING – the why, what and how of the streaming analytics platform ...
Bas Geerdink
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Anyscale
 
ML Model Serving at Twitter
Zhiyong (Joe) Xie
 
DevOps for DataScience
Stepan Pushkarev
 
Modelling and Querying Sensor Services using Ontologies
Wassim Derguech
 
Streamsets and spark in Retail
Hari Shreedharan
 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PPTX
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
PDF
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPTX
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
05_Jelle Baats_Tekst.pptx_AI_Barometer_Release_Event
FinTech Belgium
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 

A Microservices Framework for Real-Time Model Scoring Using Structured Streaming with Vedant Jain

  • 1. A Microservices Framework for Real time Model Scoring using Structured Streaming Vedant Jain October 2018 #SAISStreaming1
  • 2. 2 About me • Solutions Architect @ Databricks • x-Hortonworks, JPMC
  • 3. 3 What will we talk about? § Machine learning § Model scoring § Microservices § Structured Streaming § Demo
  • 4. 4 Machine Learning “…what we want is a machine that can learn from experience.”
  • 5. Types of Machine Learning • Unsupervised Learning
  • 6. X Y Z 0.5 -3.4 9.2 -2.3 2.9 3.2 1.3 2.3 -4.5 • Supervised Learning Types of Machine Learning Activity Biking Standing Sitting Labels X Y Z 0.5 -3.4 9.2
  • 7. X Y Z 0.5 -3.4 9.2 -2.3 2.9 3.2 1.3 2.3 -4.5 • Supervised Learning Types of Machine Learning Activity Biking Standing Sitting
  • 8. Continuous Learning Machine Learning Process Define the problem Connect data source Acquire Data Enrich, Transform Data Feature Selection Build/Train Model Serve/Score Model
  • 9. 9 What will we talk about? § Machine learning § Model scoring § Microservices § Structured Streaming § Demo
  • 10. Model Scoring Propensity: Likelihood of a user to commit a certain action Affinity: How similar are two products or users etc. Lead: How closely matched lead is to target profile Attrition/Churn: Likelihood of a customer to drop a service and/or start using a competitor’s service Credit: Ability of the user to keep promise if granted access Anomaly Detection: Identification of rare or invalid transaction
  • 11. 11 Model Scoring on “Big Data” Scale Streaming Dynamic
  • 12. Machine Learning Pipeline 12 • Drop null values • Calculate moving averages on 10 minute window • Convert to single parquet • OneHotEncoder • RF Classifier • Cross Tabulation • K-Means Clustering • Hyperparameter Tuning • Pipelines • Deploy Pipelines to Docker • Serve the models using Python Flask • Phone/Watch • Gyroscope/Accelerometer • CSV
  • 13. 13 What will we talk about? § Machine learning § Model scoring § Microservices § Structured Streaming § Demo
  • 14. Microservices ● Properties: Ø Decomposition: Logic is broken down into multiple independent components Ø Isolation: Component services are deployed and maintained independently of one another ● Benefits: ü Reduced Regression Testing time ü Organizational Autonomy ü Cloud/On-premise Agnostic ü Scalability/Reusability
  • 15. Microservices Example: Company A tracks user’s activity through smart devices and wants to provide tailored content to the users based on their behavior.
  • 16. Microservices X Y Z Activity 0.5 -3.4 9.2 Biking -2.3 2.9 3.2 Standing 1.3 2.3 -4.5 Sitting ‘Activity’ score User Bike Stand Sit Cluster A 0.1 0.3 0.6 0 B 0.0 0.3 0.7 1 Actionable Insights Web services Target content Affinity Score
  • 19. Microservices Scalability • Consistent • Scalable • Fault tolerant • Complex Event Processing
  • 21. 21 What will we talk about? § Machine learning § Model scoring § Microservices § Structured Streaming § Demo
  • 22. Structured Streaming • High-level streaming API built on Spark SQL engine • Runs the same computation as batch queries in Datasets/DataFrames • Event time, windowing, sessions, sources & sinks • End-to-end exactly once semantics • Late Data Handling
  • 23. 23 ML Limitations in Streaming • Many models/transformers/estimators are not supported • Limited to only models built in Spark MLLib • Not ideal for Continuous Learning 23
  • 24. 24 What will we talk about? § Machine Learning § Model scoring § Microservices § Structured Streaming § Demo https://github.com/vedantja/eu_summit_demo