SlideShare a Scribd company logo
Copyright © 2019 Impetus Technologies, Inc.
You are prohibited from making a copy or modification of, or from redistributing, rebroadcasting, or
re-encoding of this content without the prior written consent of Impetus.
This presentation may include images from other products and services. These images are used for
illustrative purposes only. Unless explicitly stated there is no implied endorsement or sponsorship of
these products by Impetus. All copyrights and trademarks are property of their respective owners.
Anomaly Detection with Machine Learning
at Scale
Our mission
Enabling a unified, clear, and present view of
your business
Agenda
Background
Steps and techniques
Enterprise challenges and solutions
Anomaly detection lifecycle
Implementation on StreamAnalytix
Q&A
Poll
Speakers
Saurabh Dutta
Technical Product Manager – StreamAnalytix
Impetus Technologies
Richa Pathak
Lead Engineer – Data Science
Impetus Technologies
What is an anomaly?
Deviation in an event from its expected
value within a group
Application of anomaly detection
Manufacturing
Cyber security
Banking and finance
Healthcare
Travel
Identification of anomalies
Normal Anomalous
Temperature
Temperature
Visits
Visits
Normal ?
Why is it difficult?
Normal behavior changes over time
Anomalies differ with domains
Noise tends to be similar to anomalies
Machine learning for anomaly detection
Simple yet effective approach for detecting and classifying anomalies
Steps for anomaly detection with ML
Categories
Data characteristics
Techniques
Target
Step 1: Categories
Point anomaly Contextual anomaly Collective anomaly
O3
O2
O1
N1
N2
Step 2: Data characteristics
Data
Numerical
Discrete
Continuous
Categorical
Nominal
Ordinal
Binary
Tid Source IP Duration Destination IP Bytes Internal
1 206.163.37.81 0.10 160.94.179.208 150 No
2 206.163.37.99 0.27 160.94.179.235 208 No
3 160.94.123.45 1.23 160.94.179.221 195 Yes
4 206.163.37.37 112.03 160.94.179.253 199 No
5 206.163.37.41 0.32 160.94.179.244 181 No
Step 3: Techniques
Supervised
(Classification)
Semi-supervised
(Novelty detection)
Unsupervised
(Clustering)
Labeled data Subset of labelled data Unlabeled data
Supervised learning
Supervised
(Classification)
Test data
Training
data
Result
Model
Neural
networks
Decision
tree SVM
Bayesian
network
K-nearest
neighbor
Supervised
(Classification)
Semi-supervised learning
Semi-supervised
(Novelty detection)
Test data
Training
data
Result
Model
Clustering Gaussian SVM Tree-based Neural
Semi-supervised
(Novelty detection)
Unsupervised learning
K-means Bisecting
k-means
Autoencoder
Unsupervised
(Clustering)
Unlabeled data
Result
Unsupervised
(Clustering)
Unsupervised
algorithms
Step 4: Target
Scores
NormalAnomalous
0.71
AnomalyNormal
Labels
Target: Class Target: Score
Blueprint of an anomaly detection system
OutputRaw data Insights
Anomaly detection
system
Feature
selection
ML technique
Detection and
alert
Data pre-
processing
Evaluation
Anomaly detection algorithms
Credit card fraud detection
Clustering
Decision trees
SVM
Neural networks
Health monitoring
Nearest neighbor technique
Naïve bayes
Parametric statistical modeling
Neural networks
Fault detection in mechanical units
Random forest
Gradient boosted trees
Spectral methods
Neural networks
Recap
Poll results
Why have enterprises not been able to adopt
machine learning completely?
Challenges
Finding the right talent
Building a solution, not a model
Agility
Consolidation of data
Cross department expertise
Maintenance
Solutions
Finding the right talent—Extremely easy to use and create solutions
Building a solution not a model—Ability to prototype and create solutions
Agility—Rapidly develop and operationalize
Consolidation of data—Connect to a variety of sources to enrich data
Cross department expertise—Collaboration capabilities
Maintenance—Retraining/versioning/upgrades
StreamAnalytix Self-service ETL + Analytics
Simplify and accelerate application
development
Raw data Structured
data
Pre-
processing
Feature
selection
Modeling
technique
Evaluation Tuning
Anomaly detection lifecycle
Training
Anomaly detection lifecycle
Performance
Metrics
• RMSE
• R2 error
• Explained
variance
Confusion
matrix
Metrics
• Precision
• Recall
• Specificity
• Accuracy
Actual vs.
Predicted
Regression
Feature
coefficients ROC curve
Feature
importance
Classification
Anomaly detection lifecycle
Scoring
ETL pipeline Model scoring BI & downstream
apps
Kafka Register as Table SQL TempStore
CustomerHistory
Anomaly detection lifecycle
Model management
Model selection
DEV PROD
Model deploymentRetraining & versioning
Machine failure prediction use case
Identify potential failure in machines
Dataset
CSV containing attributes
pressureInd
moistureInd
temperatureInd
device
location
provider
timestamp
Demo
StreamAnalytix platform approach
Reusability
Monitoring and debugging
Promoting from Dev to Test to Production
Orchestration of multiple workflows
Auditing user changes
On-premise or cloud
Scalability and elasticity
Anomaly detection with machine learning at scale
Thank you. Questions?
Visit www.streamanalytix.com or get in touch with us at inquiry@streamanalytix.com
Meet us at the Strata Data Conference NY, Booth #1321, September 24–26

More Related Content

What's hot (20)

PPTX
Anomaly detection
Dr. Stylianos Kampakis
 
PDF
An Introduction to Anomaly Detection
Kenneth Graham
 
PDF
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
PyData
 
PDF
Isolation Forest
Konkuk University, Korea
 
PPTX
Random forest algorithm
Rashid Ansari
 
PDF
Anomaly Detection: A Survey
Konkuk University, Korea
 
PDF
Anomaly Detection in Seasonal Time Series
Humberto Marchezi
 
PDF
Anomaly detection
QuantUniversity
 
PDF
Dimensionality Reduction
mrizwan969
 
PPTX
Random Forest Classifier in Machine Learning | Palin Analytics
Palin analytics
 
PDF
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
PPTX
Machine learning with ADA Boost
Aman Patel
 
PPT
2.4 rule based classification
Krish_ver2
 
PPTX
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Impetus Technologies
 
PPTX
Data mining presentation.ppt
neelamoberoi1030
 
PPTX
Artificial Intelligence, Machine Learning and Deep Learning
Sujit Pal
 
PPTX
K-Folds Cross Validation Method
SHUBHAM GUPTA
 
PPTX
Data Mining
SHIKHA GAUTAM
 
PPTX
Machine Learning - Dataset Preparation
Andrew Ferlitsch
 
Anomaly detection
Dr. Stylianos Kampakis
 
An Introduction to Anomaly Detection
Kenneth Graham
 
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
PyData
 
Isolation Forest
Konkuk University, Korea
 
Random forest algorithm
Rashid Ansari
 
Anomaly Detection: A Survey
Konkuk University, Korea
 
Anomaly Detection in Seasonal Time Series
Humberto Marchezi
 
Anomaly detection
QuantUniversity
 
Dimensionality Reduction
mrizwan969
 
Random Forest Classifier in Machine Learning | Palin Analytics
Palin analytics
 
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Machine learning with ADA Boost
Aman Patel
 
2.4 rule based classification
Krish_ver2
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Impetus Technologies
 
Data mining presentation.ppt
neelamoberoi1030
 
Artificial Intelligence, Machine Learning and Deep Learning
Sujit Pal
 
K-Folds Cross Validation Method
SHUBHAM GUPTA
 
Data Mining
SHIKHA GAUTAM
 
Machine Learning - Dataset Preparation
Andrew Ferlitsch
 

Similar to Anomaly detection with machine learning at scale (20)

PDF
A Comprehensive Introduction to Anomaly Detection in Machine Learning | USAII®
United States Artificial Intelligence Institute
 
PPTX
Data drift and machine learning
Smita Agrawal
 
PPTX
Data drift and machine learning
Smita Agrawal
 
PDF
Data Science Transforming Security Operations
Priyanka Aash
 
PPTX
Testing in an ai driven world
Craig Risi
 
PPTX
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...
Jon Mead
 
PPTX
Momentum Health Solutions- Smart Diagnostics: AI and Machine Learning in Earl...
itnewsafrica
 
PDF
ANIn Pune May 2024 | Best practices in testing of AI based SaMD by Anupama An...
AgileNetwork
 
PPTX
Fraud detection analysis
SAI MANIKANTA MANASANI
 
PPTX
demo AI ML.pptx
PriyadharshiniG41
 
DOCX
What is Data Science and How is it Used.docx
ujjwalchauhan660
 
PPTX
Machine_learning_algorithms1111wwww11.pptx
banerjeeshramana75
 
PPT
NBTC 2004 Presentation Final
Joe Anandarajah
 
PPTX
AI-Powered-Anomaly-Detection-in-Time-Series-Data.pptx
d2023nagdevaryan
 
PPTX
AI-Powered-Anomaly-Detection-in-Time-Series-Data.pptx
d2023nagdevaryan
 
PDF
MLPA for health care presentation smc
Shaun Comfort
 
PPTX
Data Science in Manufacturing and Automation
Ravishankar Rajagopalan
 
PDF
Predictive Modelling
Rajib Kumar De
 
PPTX
Agile Chennai 30-31Aug 2024 | AI Powered Digital Employee Experience by Chand...
AgileNetwork
 
PDF
Anomaly Detection in big data
aNumak & Company
 
A Comprehensive Introduction to Anomaly Detection in Machine Learning | USAII®
United States Artificial Intelligence Institute
 
Data drift and machine learning
Smita Agrawal
 
Data drift and machine learning
Smita Agrawal
 
Data Science Transforming Security Operations
Priyanka Aash
 
Testing in an ai driven world
Craig Risi
 
Machine Learning: Addressing the Disillusionment to Bring Actual Business Ben...
Jon Mead
 
Momentum Health Solutions- Smart Diagnostics: AI and Machine Learning in Earl...
itnewsafrica
 
ANIn Pune May 2024 | Best practices in testing of AI based SaMD by Anupama An...
AgileNetwork
 
Fraud detection analysis
SAI MANIKANTA MANASANI
 
demo AI ML.pptx
PriyadharshiniG41
 
What is Data Science and How is it Used.docx
ujjwalchauhan660
 
Machine_learning_algorithms1111wwww11.pptx
banerjeeshramana75
 
NBTC 2004 Presentation Final
Joe Anandarajah
 
AI-Powered-Anomaly-Detection-in-Time-Series-Data.pptx
d2023nagdevaryan
 
AI-Powered-Anomaly-Detection-in-Time-Series-Data.pptx
d2023nagdevaryan
 
MLPA for health care presentation smc
Shaun Comfort
 
Data Science in Manufacturing and Automation
Ravishankar Rajagopalan
 
Predictive Modelling
Rajib Kumar De
 
Agile Chennai 30-31Aug 2024 | AI Powered Digital Employee Experience by Chand...
AgileNetwork
 
Anomaly Detection in big data
aNumak & Company
 
Ad

More from Impetus Technologies (17)

PPTX
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
Impetus Technologies
 
PPTX
Eliminate cyber-security threats using data analytics – Build a resilient ent...
Impetus Technologies
 
PPTX
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
Impetus Technologies
 
PPTX
Building a mature foundation for life in the cloud
Impetus Technologies
 
PPTX
Best practices to build a sustainable data lake on cloud - Impetus Webinar
Impetus Technologies
 
PPTX
Automate and Optimize Data Warehouse Migration to Snowflake
Impetus Technologies
 
PPTX
Instantly convert Teradata ETL and EDW to Spark- Impetus webinar
Impetus Technologies
 
PPTX
Keys to establish sustainable DW and analytics on the cloud -Impetus webinar
Impetus Technologies
 
PPTX
Solving the EDW transformation conundrum - Impetus webinar
Impetus Technologies
 
PPTX
Keys to Formulating an Effective Data Management Strategy in the Age of Data
Impetus Technologies
 
PPTX
Build Spark-based ETL Workflows on Cloud in Minutes
Impetus Technologies
 
PPTX
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Impetus Technologies
 
PPTX
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Impetus Technologies
 
PPTX
Streaming Analytics for IoT with Apache Spark
Impetus Technologies
 
PPTX
The structured streaming upgrade to Apache Spark and how enterprises can bene...
Impetus Technologies
 
PPTX
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Impetus Technologies
 
PPTX
Importance of Big Data Analytics
Impetus Technologies
 
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
Impetus Technologies
 
Eliminate cyber-security threats using data analytics – Build a resilient ent...
Impetus Technologies
 
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
Impetus Technologies
 
Building a mature foundation for life in the cloud
Impetus Technologies
 
Best practices to build a sustainable data lake on cloud - Impetus Webinar
Impetus Technologies
 
Automate and Optimize Data Warehouse Migration to Snowflake
Impetus Technologies
 
Instantly convert Teradata ETL and EDW to Spark- Impetus webinar
Impetus Technologies
 
Keys to establish sustainable DW and analytics on the cloud -Impetus webinar
Impetus Technologies
 
Solving the EDW transformation conundrum - Impetus webinar
Impetus Technologies
 
Keys to Formulating an Effective Data Management Strategy in the Age of Data
Impetus Technologies
 
Build Spark-based ETL Workflows on Cloud in Minutes
Impetus Technologies
 
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Impetus Technologies
 
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Impetus Technologies
 
Streaming Analytics for IoT with Apache Spark
Impetus Technologies
 
The structured streaming upgrade to Apache Spark and how enterprises can bene...
Impetus Technologies
 
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Impetus Technologies
 
Importance of Big Data Analytics
Impetus Technologies
 
Ad

Recently uploaded (20)

PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PDF
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPTX
What Is Data Integration and Transformation?
subhashenia
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
InformaticsPractices-MS - Google Docs.pdf
seshuashwin0829
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
What Is Data Integration and Transformation?
subhashenia
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
BinarySearchTree in datastructures in detail
kichokuttu
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
Research Methodology Overview Introduction
ayeshagul29594
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 

Anomaly detection with machine learning at scale

Editor's Notes

  • #4: It is important to identify unusual cases within data that are homogeneous and may have some significance in respective domain. Anomaly detection is used widely to perform various tasks such as fraud detection in the financial industry, network breach for cyber-security, and enemy surveillance for the military https://www.google.com/imgres?imgurl=https%3A%2F%2Fblue-sea-697d.quartiers047.workers.dev%3A443%2Fhttps%2Fimages.slideplayer.com%2F26%2F8632187%2Fslides%2Fslide_2.jpg&imgrefurl=https%3A%2F%2Fblue-sea-697d.quartiers047.workers.dev%3A443%2Fhttps%2Fslideplayer.com%2Fslide%2F8632187%2F&docid=fc_EKZQGxy3rzM&tbnid=XTUWmOTndm_YJM%3A&vet=10ahUKEwj2zYzj0fDjAhXNSH0KHQYfAT8QMwhBKAIwAg..i&w=960&h=720&bih=610&biw=1280&q=anomaly%20detection%20in%20cyber%20security&ved=0ahUKEwj2zYzj0fDjAhXNSH0KHQYfAT8QMwhBKAIwAg&iact=mrc&uact=8
  • #5: It is important to identify unusual cases within data that are homogeneous and may have some significance in respective domain. Anomaly detection is used widely to perform various tasks such as fraud detection in the financial industry, network breach for cyber-security, and enemy surveillance for the military https://www.google.com/imgres?imgurl=https%3A%2F%2Fblue-sea-697d.quartiers047.workers.dev%3A443%2Fhttps%2Fimages.slideplayer.com%2F26%2F8632187%2Fslides%2Fslide_2.jpg&imgrefurl=https%3A%2F%2Fblue-sea-697d.quartiers047.workers.dev%3A443%2Fhttps%2Fslideplayer.com%2Fslide%2F8632187%2F&docid=fc_EKZQGxy3rzM&tbnid=XTUWmOTndm_YJM%3A&vet=10ahUKEwj2zYzj0fDjAhXNSH0KHQYfAT8QMwhBKAIwAg..i&w=960&h=720&bih=610&biw=1280&q=anomaly%20detection%20in%20cyber%20security&ved=0ahUKEwj2zYzj0fDjAhXNSH0KHQYfAT8QMwhBKAIwAg&iact=mrc&uact=8
  • #8: 1) Anomalies might be induced in the data for a variety of reasons, such as malicious activity, e.g., credit card fraud, cyber-intrusion, terrorist activity or breakdown of a system. From Marketer’s point of view the examples of anomalies would be : Decrease in number of sales of movie ticket. Rise in the number of downloads of app. Increase in number of visits to a website. The anomalies can be positive and negative.  It is applicable in domains such as fraud detection, intrusion detection, fault detection, system health monitoring and event detection systems in sensor networks. 
  • #9: It is important to identify unusual cases within data that are homogeneous and may have some significance in respective domain. Anomaly detection is used widely to perform various tasks such as fraud detection in the financial industry, network breach for cyber-security, and enemy surveillance for the military https://www.google.com/imgres?imgurl=https%3A%2F%2Fblue-sea-697d.quartiers047.workers.dev%3A443%2Fhttps%2Fimages.slideplayer.com%2F26%2F8632187%2Fslides%2Fslide_2.jpg&imgrefurl=https%3A%2F%2Fblue-sea-697d.quartiers047.workers.dev%3A443%2Fhttps%2Fslideplayer.com%2Fslide%2F8632187%2F&docid=fc_EKZQGxy3rzM&tbnid=XTUWmOTndm_YJM%3A&vet=10ahUKEwj2zYzj0fDjAhXNSH0KHQYfAT8QMwhBKAIwAg..i&w=960&h=720&bih=610&biw=1280&q=anomaly%20detection%20in%20cyber%20security&ved=0ahUKEwj2zYzj0fDjAhXNSH0KHQYfAT8QMwhBKAIwAg&iact=mrc&uact=8
  • #10: which is time series data collected from temperature sensors in a factory 
  • #11: https://towardsdatascience.com/a-note-about-finding-anomalies-f9cedee38f0b
  • #12: Now we understand the challenge in Anomaly detection problem, so first we need to identify what is Normal. We can define the definition of Normal depending upon the application but we not only define what is Normal but we also build the model to identify normal pattern is and then take a decision about the abnormal behaviour/pattern. This is where Machine learning comes into picture.
  • #13: Model development for anomaly detection depends on:
  • #14: Anomaly detection is related to, but distinct from noise removal which deal with unwanted removal of noise in the data. Point Anomaly: If an object significantly deviates from the rest of the data. A single instance of data is anomalous if it's too far off from the rest. Business use case: Detecting credit card fraud based on "amount spent.“ Contextual Anomaly: If an individual data instance is anomalous in a specific context(but not otherwise). The abnormality is context specific. This type of anomaly is common in time-series data. Business use case: Spending $100 on food every day during the holiday season is normal, but may be odd otherwise. Collective Anomaly: If a collection of related data instances is anomalous with respect to entire data set. A set of data instances collectively helps in detecting anomalies. Business use case: Someone is trying to copy data form a remote machine to a local host unexpectedly, an anomaly that would be flagged as a potential cyber attack.  https://www.datascience.com/blog/python-anomaly-detection
  • #15: Need to work on this.
  • #20: An important aspect for any anomaly detection technique is the manner in which the anomalies are reported. Label: Each observation is assigned with a label. True for classification/supervised based approaches. Scores: Each observation is assigned with an anomalous score. Output is an ranked list of anomalies Use a cut-off threshold to select the anomalies
  • #26: Finding the right talent hinders the effectiveness of the solution. Performance metrics in SAX can help https://www.analyticsindiamag.com/10-challenges-that-data-science-industry-still-faces/
  • #27: https://www.analyticsindiamag.com/10-challenges-that-data-science-industry-still-faces/ Maintenance https://www.forbes.com/sites/laurencebradford/2018/09/06/8-real-challenges-data-scientists-face/#360466b6d999
  • #30: Model Development Aspects/Lifecycle
  • #31: Model Development Aspects/Lifecycle
  • #33: https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/ This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them
  • #36: Scalability due to distributed computing and elasticity is built in scale in scale out Reusability of logic