SlideShare a Scribd company logo
Anomaly Detection: Real World Scenarios,
Approaches and Live Implementation
WEBINAR | DECEMBER 15, 2017
Saurabh DuttaRavishankar Rao Vallabhajosyula
SENIOR DATA SCIENTIST, IMPETUS TECHNOLOGIES
TWITTER: @ImpetusTech
TECHNICAL PRODUCT MANAGER, STREAMANALYTIX
TWITTER: @StreamAnalytix
Agenda
• What’s an anomaly?
• Real world use cases of anomaly detection
• Key steps in anomaly detection
• A deep dive into building an anomaly detection model
• Types of anomaly detection
• Data attributes
• Approaches and methods
• A platform approach to anomaly detection
• Live implementation using StreamAnalytix
• Q & A
About Impetus
Mission critical
technology solutions
since 1996
Fortune 500:
Big data clients
1700 people; US,
India, global reach
Unique mix of
big data products
and services
What’s an Anomaly?
Anomaly: is an observation that greatly
deviates from most of the other
observations, i.e., a data
point/behavior/pattern that appears to be
statistically unusual or 'anomalous'
Basic qualities of anomaly:
1. Rare
2. Significantly different from others
What is different about modern anomaly detection?
• Rule based methods are hard to scale
• Modern data science techniques are more efficient
• Can work with real-time data
• Improve detection across multiple channels
• Learn and detect variations
• Adaptable to multiple domains
Real world use cases of anomaly detection
Anomaly detection is influencing business decisions across verticals
MANUFACTURING
Detect abnormal machine
behavior to prevent cost
overruns
FINANCE & INSURANCE
Detect and prevent out of
pattern or fraudulent
spend, travel expenses
HEALTHCARE
Detect fraud in claims
and payments; events
from RFID and mobiles
BANKING
Flag abnormally high
purchases/deposits, detect
cyber intrusions
NETWORKING
Detect intrusion into
networks, prevent theft
of source code or IP
SOCIAL MEDIA
Detect compromised
accounts, bots that
generate fake reviews
VIDEO SURVEILLANCE
Detect or track objects
and persons of interest
in monotonous footage
SMART HOUSE
Detect energy leakage,
standardize smart
sensor datasets
TELECOM
Detect roaming abuse,
revenue fraud, service
disruptions
TRANSPORTATION
Ensure external
communications to the
vehicle are not intrusion
Key steps in anomaly detection
• Problem identification and setting expectations
• Defining the sources and schema
• Parsing and pre-processing
• Model development
• Model execution
• Investigation and feedback
• Model updating
• Operationalize model for scoring
Key steps in anomaly detection
• Problem identification and setting expectations
• Defining the sources and schema
• Parsing and pre-processing
• Model development
• Model execution
• Investigation and feedback
• Model updating
• Operationalize model for scoring
Model development for anomaly detection
Type of anomaly
detection used
Type of data
available
If the data has
labels
Taxonomy of anomaly detection
Anomaly Detection
Collective AnomalyContextual AnomalyPoint Anomaly
Data – Types of attributes
Data
Categorical
Nominal
Ordinal
Numerical
Named
Categories
Categories with
an implied order
Discrete
Continuous
Only particular
numbers
Any numerical
value
Binary
Variables with
only two options
(Yes/No)
Data – Choice of algorithm
Data
Categorical
Nominal
Ordinal
Numerical
Discrete
Continuous
Binary
Apply K-means clustering
Data has no labels
Apply time-series anomaly
detection algorithms
When time-stamps are
present
Data has labels
Use standard machine learning
classifiers
Use sequence classification algorithms
When time-stamps are
absent
Approaches to anomaly detection
Model
Test Data
Result
Training
Data
Supervised
(Classification)
Data skewness, lack of
counter examples
Model
Test Data
Result
Training
Data
Semi-supervised
(Novelty detection)
Requires a 'normal'
training dataset
Model
Unlabeled
Data
Result
Unsupervised
(Clustering)
Faces curse of dimensionality
Unsupervised
Algorithm
Methods for anomaly detection:
Categorical and numeric attributes
K-modes Generic mixture models Robust SVM
Uses hamming distance
to measure distance for
categorical features
Extends the framework of
Gaussian mixture models
Kernel-based approach that identifies
regions in which data resides in
alternate feature space
Methods for anomaly detection: Sequential data
State space models Hidden Markov models Graph-based methods
Model the evolution of data in time to enable
forecasting and flag an anomaly if it exceeds
a threshold
Markov Chains and HMMs measure the
probability of different events happening in
some sequence
Graphs capture interdependencies, and
allow discovery of relational associations
such as in fraud
System
Behavior
model
Observed
behavior
Expected
behavior
Observation
Model
Formation
Anomaly
Detection
Simulation
Latest methods for anomaly detection
Deep Learning (AutoEncoder) Deep Learning (RNN-based) Generative Adversarial Nets
AutoEncoders can learn the latent representation
of the data by using an encoder and a decoder
together
RNN-based architectures enable sequence
prediction. The network can flag an anomaly
when needed
GANs combine two neural networks - a
generator and a discriminator, and can be
used to find anomalies
Anomaly detection algorithms
Host-based IDS
• Statistical profiling using histograms
• Mixture of models,
• Neural networks
• SVM, Rule-based systems
Network intrusion detection
• Statistical profiling using histograms
• Parametric statistical modeling
• Non-parametric statistical modeling
• Bayesian networks, Neural networks
• SVM, Rule-based systems
• Clustering based, Nearest neighbor
• Spectral, Information Theoretic
Credit card fraud detection
• Neural Networks,
• Rule-based systems
• Clustering, Self-organizing map
• Artificial immune system
• Decision trees, SVM
Mobile phone fraud detection
• Statistical profiling using histograms
• Parametric statistical modeling
• Neural networks, Rule-based systems
Insider trading detection
• Statistical profiling using histograms
• Information theoretic
Medical and public health
• Parametric statistical modeling
• Neural networks, Bayesian networks
• Rule-based systems
• Nearest neighbor techniques
Fault detection in mechanical units
• Parametric statistical modeling
• Non-parametric statistical modeling
• Neural networks, Spectral methods
• Rule-based systems
Structural damage detection
• Statistical profiling using histograms
• Parametric statistical modeling
• Mixture of models, Neural networks
Image processing, Surveillance
• Mixture of models, Regression, SVM
• Bayesian networks, Neural networks,
• Clustering, Nearest neighbor methods
Anomalous topic detection
• Mixture of models, Neural networks
• Statistical profiling using histograms
• Clustering, SVM
Anomaly detection in sensor networks
• Parametric statistical modeling
• Bayesian networks, Nearest neighbor
• Rule-based systems, Spectral
Poll question:
At what stage is your organization in implementing anomaly detection techniques /
solutions using advanced Data Science / Machine Learning / Real-time approaches?
Stage 0: We do not have any plans yet, I am here for education
Stage 1: We are at an initial planning stage
Stage 2: Currently evaluating platforms/ implementation partners
Stage 3: Implementation underway
Stage 4: Already using a modern anomaly detection platform/ solution
Key steps in anomaly detection
• Problem identification and setting expectations
• Defining the sources and schema
• Parsing and pre-processing
• Model development
• Model execution
• Investigation and feedback
• Model updating
• Operationalize model for scoring
A modern platform approach to anomaly detection
• Multi-tenancy
• Rapidly develop and operationalize
• Apply data science / machine learning techniques with real-time data
• A-B testing
• Easily scalable
• Monitor, debug and diagnose at scale
• Version management
• Deployment workflow: Dev – Test – Prod
Real-time Stream Processing and Machine Learning Platform
ENABLING THE REAL-TIME ENTERPRISE
Implementing credit card fraud detection in real-time using
Schema overview
{
"isMerchantCompromised": 0,
"isfraudent": true,
"transactionAmount": 11276.0,
"phone": "1478523699",
"radiusFromResidence": 2.0,
"deviation": 10.0,
"averageTransaction": 4608.0,
"city": 3,
"transactionTime": "1512979321050",
"email": "ava@mail.com",
"name": "Jean",
"gender": "Male",
"merchantName": “My_Company",
"timeOfDay": "10:30:19",
"merchantCity": 10
}
Build Apache Spark Applications Within Minutes
https://www.streamanalytix.com/download
Key takeaways
• Modern data science techniques significantly improve detection of anomalies
• It is possible to do it on streaming data in a scalable manner
• Modern platforms can simplify implementation and reduce development cycle
Thank you.
Questions?
© 2017 Impetus Technologies
Email: inquiry@streamanalytix.com Twitter : @ImpetusTech / @StreamAnalytix

More Related Content

What's hot (20)

PPTX
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Impetus Technologies
 
PPTX
Anomaly detection
Dr. Stylianos Kampakis
 
PDF
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
PyData
 
PPTX
Support Vector Machine ppt presentation
AyanaRukasar
 
PPTX
Exploratory data analysis
Gramener
 
PPTX
Data Mining: Graph mining and social network analysis
DataminingTools Inc
 
PPTX
Web mining
TeklayBirhane
 
PPTX
K-Folds Cross Validation Method
SHUBHAM GUPTA
 
PPTX
Text Mining
Biniam Asnake
 
PPTX
Machine learning clustering
CosmoAIMS Bassett
 
PDF
Data preprocessing using Machine Learning
Gopal Sakarkar
 
PPTX
Outlier analysis and anomaly detection
ShantanuDeosthale
 
PPTX
Introduction to Data Mining
DataminingTools Inc
 
PPTX
A review of machine learning based anomaly detection
Mohamed Elfadly
 
PDF
Anomaly Detection using Deep Auto-Encoders
Gianmario Spacagna
 
PPTX
Association rule mining.pptx
maha797959
 
PDF
Lecture1 introduction to big data
hktripathy
 
PDF
ML Basics
SrujanaMerugu1
 
PDF
Anomaly Detection: A Survey
Konkuk University, Korea
 
PPTX
Support vector machine
zekeLabs Technologies
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Impetus Technologies
 
Anomaly detection
Dr. Stylianos Kampakis
 
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
PyData
 
Support Vector Machine ppt presentation
AyanaRukasar
 
Exploratory data analysis
Gramener
 
Data Mining: Graph mining and social network analysis
DataminingTools Inc
 
Web mining
TeklayBirhane
 
K-Folds Cross Validation Method
SHUBHAM GUPTA
 
Text Mining
Biniam Asnake
 
Machine learning clustering
CosmoAIMS Bassett
 
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Outlier analysis and anomaly detection
ShantanuDeosthale
 
Introduction to Data Mining
DataminingTools Inc
 
A review of machine learning based anomaly detection
Mohamed Elfadly
 
Anomaly Detection using Deep Auto-Encoders
Gianmario Spacagna
 
Association rule mining.pptx
maha797959
 
Lecture1 introduction to big data
hktripathy
 
ML Basics
SrujanaMerugu1
 
Anomaly Detection: A Survey
Konkuk University, Korea
 
Support vector machine
zekeLabs Technologies
 

Similar to Anomaly Detection - Real World Scenarios, Approaches and Live Implementation (20)

PDF
AI in anomaly detection - An Overview.pdf
StephenAmell4
 
PDF
AI in anomaly detection.pdf
StephenAmell4
 
PDF
A Comprehensive Introduction to Anomaly Detection in Machine Learning | USAII®
United States Artificial Intelligence Institute
 
PDF
How to build an AI-based anomaly detection system for fraud prevention.pdf
ChristopherTHyatt
 
PDF
Analytics for large-scale time series and event data
Anodot
 
PPTX
A review of machine learning based anomaly detection
Mohamed Elfadly
 
PDF
Anomly and fraud detection using AI - Artivatic.ai
Artivatic.ai
 
PDF
Fraud detection- Retail, Banking, Finance & FMCG
Artivatic.ai
 
PDF
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
IJCI JOURNAL
 
PDF
leewayhertz.com-How to build an AI-based anomaly detection system for fraud p...
alexjohnson7307
 
PPTX
AI-Powered-Anomaly-Detection-in-Time-Series-Data.pptx
d2023nagdevaryan
 
PPTX
AI-Powered-Anomaly-Detection-in-Time-Series-Data.pptx
d2023nagdevaryan
 
PPTX
Anomaly Detection for Security
Cody Rioux
 
PDF
Watch everything, Watch anything
Nathaniel Cook
 
PDF
Empowering anomaly detection algorithm: a review
IAESIJAI
 
PDF
POSTER_Ewonye.pdf
kwadwoAmedi
 
PPTX
Anomaly detection workshop
gforgovind
 
PDF
anomalydetection-191104083630.pdf
hanadi40
 
PDF
Detecting Anomalies in Streaming Data
Subutai Ahmad
 
PDF
Detecting Anomalies in Streaming Data
Numenta
 
AI in anomaly detection - An Overview.pdf
StephenAmell4
 
AI in anomaly detection.pdf
StephenAmell4
 
A Comprehensive Introduction to Anomaly Detection in Machine Learning | USAII®
United States Artificial Intelligence Institute
 
How to build an AI-based anomaly detection system for fraud prevention.pdf
ChristopherTHyatt
 
Analytics for large-scale time series and event data
Anodot
 
A review of machine learning based anomaly detection
Mohamed Elfadly
 
Anomly and fraud detection using AI - Artivatic.ai
Artivatic.ai
 
Fraud detection- Retail, Banking, Finance & FMCG
Artivatic.ai
 
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
IJCI JOURNAL
 
leewayhertz.com-How to build an AI-based anomaly detection system for fraud p...
alexjohnson7307
 
AI-Powered-Anomaly-Detection-in-Time-Series-Data.pptx
d2023nagdevaryan
 
AI-Powered-Anomaly-Detection-in-Time-Series-Data.pptx
d2023nagdevaryan
 
Anomaly Detection for Security
Cody Rioux
 
Watch everything, Watch anything
Nathaniel Cook
 
Empowering anomaly detection algorithm: a review
IAESIJAI
 
POSTER_Ewonye.pdf
kwadwoAmedi
 
Anomaly detection workshop
gforgovind
 
anomalydetection-191104083630.pdf
hanadi40
 
Detecting Anomalies in Streaming Data
Subutai Ahmad
 
Detecting Anomalies in Streaming Data
Numenta
 
Ad

More from Impetus Technologies (17)

PPTX
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
Impetus Technologies
 
PPTX
Eliminate cyber-security threats using data analytics – Build a resilient ent...
Impetus Technologies
 
PPTX
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
Impetus Technologies
 
PPTX
Building a mature foundation for life in the cloud
Impetus Technologies
 
PPTX
Best practices to build a sustainable data lake on cloud - Impetus Webinar
Impetus Technologies
 
PPTX
Automate and Optimize Data Warehouse Migration to Snowflake
Impetus Technologies
 
PPTX
Instantly convert Teradata ETL and EDW to Spark- Impetus webinar
Impetus Technologies
 
PPTX
Keys to establish sustainable DW and analytics on the cloud -Impetus webinar
Impetus Technologies
 
PPTX
Solving the EDW transformation conundrum - Impetus webinar
Impetus Technologies
 
PPTX
Keys to Formulating an Effective Data Management Strategy in the Age of Data
Impetus Technologies
 
PPTX
Build Spark-based ETL Workflows on Cloud in Minutes
Impetus Technologies
 
PPTX
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Impetus Technologies
 
PPTX
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Impetus Technologies
 
PPTX
Streaming Analytics for IoT with Apache Spark
Impetus Technologies
 
PPTX
The structured streaming upgrade to Apache Spark and how enterprises can bene...
Impetus Technologies
 
PPTX
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Impetus Technologies
 
PPTX
Importance of Big Data Analytics
Impetus Technologies
 
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
Impetus Technologies
 
Eliminate cyber-security threats using data analytics – Build a resilient ent...
Impetus Technologies
 
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
Impetus Technologies
 
Building a mature foundation for life in the cloud
Impetus Technologies
 
Best practices to build a sustainable data lake on cloud - Impetus Webinar
Impetus Technologies
 
Automate and Optimize Data Warehouse Migration to Snowflake
Impetus Technologies
 
Instantly convert Teradata ETL and EDW to Spark- Impetus webinar
Impetus Technologies
 
Keys to establish sustainable DW and analytics on the cloud -Impetus webinar
Impetus Technologies
 
Solving the EDW transformation conundrum - Impetus webinar
Impetus Technologies
 
Keys to Formulating an Effective Data Management Strategy in the Age of Data
Impetus Technologies
 
Build Spark-based ETL Workflows on Cloud in Minutes
Impetus Technologies
 
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Impetus Technologies
 
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Impetus Technologies
 
Streaming Analytics for IoT with Apache Spark
Impetus Technologies
 
The structured streaming upgrade to Apache Spark and how enterprises can bene...
Impetus Technologies
 
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Impetus Technologies
 
Importance of Big Data Analytics
Impetus Technologies
 
Ad

Recently uploaded (20)

PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PPTX
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PDF
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PDF
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PPTX
Climate Action.pptx action plan for climate
justfortalabat
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PPTX
Dr djdjjdsjsjsjsjsjsjjsjdjdjdjdjjd1.pptx
Nandy31
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PPT
deep dive data management sharepoint apps.ppt
novaprofk
 
PPTX
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
PDF
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
recruitment Presentation.pptxhdhshhshshhehh
devraj40467
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
Climate Action.pptx action plan for climate
justfortalabat
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
Dr djdjjdsjsjsjsjsjsjjsjdjdjdjdjjd1.pptx
Nandy31
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
deep dive data management sharepoint apps.ppt
novaprofk
 
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 

Anomaly Detection - Real World Scenarios, Approaches and Live Implementation

  • 1. Anomaly Detection: Real World Scenarios, Approaches and Live Implementation WEBINAR | DECEMBER 15, 2017 Saurabh DuttaRavishankar Rao Vallabhajosyula SENIOR DATA SCIENTIST, IMPETUS TECHNOLOGIES TWITTER: @ImpetusTech TECHNICAL PRODUCT MANAGER, STREAMANALYTIX TWITTER: @StreamAnalytix
  • 2. Agenda • What’s an anomaly? • Real world use cases of anomaly detection • Key steps in anomaly detection • A deep dive into building an anomaly detection model • Types of anomaly detection • Data attributes • Approaches and methods • A platform approach to anomaly detection • Live implementation using StreamAnalytix • Q & A
  • 3. About Impetus Mission critical technology solutions since 1996 Fortune 500: Big data clients 1700 people; US, India, global reach Unique mix of big data products and services
  • 4. What’s an Anomaly? Anomaly: is an observation that greatly deviates from most of the other observations, i.e., a data point/behavior/pattern that appears to be statistically unusual or 'anomalous' Basic qualities of anomaly: 1. Rare 2. Significantly different from others
  • 5. What is different about modern anomaly detection? • Rule based methods are hard to scale • Modern data science techniques are more efficient • Can work with real-time data • Improve detection across multiple channels • Learn and detect variations • Adaptable to multiple domains
  • 6. Real world use cases of anomaly detection Anomaly detection is influencing business decisions across verticals MANUFACTURING Detect abnormal machine behavior to prevent cost overruns FINANCE & INSURANCE Detect and prevent out of pattern or fraudulent spend, travel expenses HEALTHCARE Detect fraud in claims and payments; events from RFID and mobiles BANKING Flag abnormally high purchases/deposits, detect cyber intrusions NETWORKING Detect intrusion into networks, prevent theft of source code or IP SOCIAL MEDIA Detect compromised accounts, bots that generate fake reviews VIDEO SURVEILLANCE Detect or track objects and persons of interest in monotonous footage SMART HOUSE Detect energy leakage, standardize smart sensor datasets TELECOM Detect roaming abuse, revenue fraud, service disruptions TRANSPORTATION Ensure external communications to the vehicle are not intrusion
  • 7. Key steps in anomaly detection • Problem identification and setting expectations • Defining the sources and schema • Parsing and pre-processing • Model development • Model execution • Investigation and feedback • Model updating • Operationalize model for scoring
  • 8. Key steps in anomaly detection • Problem identification and setting expectations • Defining the sources and schema • Parsing and pre-processing • Model development • Model execution • Investigation and feedback • Model updating • Operationalize model for scoring
  • 9. Model development for anomaly detection Type of anomaly detection used Type of data available If the data has labels
  • 10. Taxonomy of anomaly detection Anomaly Detection Collective AnomalyContextual AnomalyPoint Anomaly
  • 11. Data – Types of attributes Data Categorical Nominal Ordinal Numerical Named Categories Categories with an implied order Discrete Continuous Only particular numbers Any numerical value Binary Variables with only two options (Yes/No)
  • 12. Data – Choice of algorithm Data Categorical Nominal Ordinal Numerical Discrete Continuous Binary Apply K-means clustering Data has no labels Apply time-series anomaly detection algorithms When time-stamps are present Data has labels Use standard machine learning classifiers Use sequence classification algorithms When time-stamps are absent
  • 13. Approaches to anomaly detection Model Test Data Result Training Data Supervised (Classification) Data skewness, lack of counter examples Model Test Data Result Training Data Semi-supervised (Novelty detection) Requires a 'normal' training dataset Model Unlabeled Data Result Unsupervised (Clustering) Faces curse of dimensionality Unsupervised Algorithm
  • 14. Methods for anomaly detection: Categorical and numeric attributes K-modes Generic mixture models Robust SVM Uses hamming distance to measure distance for categorical features Extends the framework of Gaussian mixture models Kernel-based approach that identifies regions in which data resides in alternate feature space
  • 15. Methods for anomaly detection: Sequential data State space models Hidden Markov models Graph-based methods Model the evolution of data in time to enable forecasting and flag an anomaly if it exceeds a threshold Markov Chains and HMMs measure the probability of different events happening in some sequence Graphs capture interdependencies, and allow discovery of relational associations such as in fraud System Behavior model Observed behavior Expected behavior Observation Model Formation Anomaly Detection Simulation
  • 16. Latest methods for anomaly detection Deep Learning (AutoEncoder) Deep Learning (RNN-based) Generative Adversarial Nets AutoEncoders can learn the latent representation of the data by using an encoder and a decoder together RNN-based architectures enable sequence prediction. The network can flag an anomaly when needed GANs combine two neural networks - a generator and a discriminator, and can be used to find anomalies
  • 17. Anomaly detection algorithms Host-based IDS • Statistical profiling using histograms • Mixture of models, • Neural networks • SVM, Rule-based systems Network intrusion detection • Statistical profiling using histograms • Parametric statistical modeling • Non-parametric statistical modeling • Bayesian networks, Neural networks • SVM, Rule-based systems • Clustering based, Nearest neighbor • Spectral, Information Theoretic Credit card fraud detection • Neural Networks, • Rule-based systems • Clustering, Self-organizing map • Artificial immune system • Decision trees, SVM Mobile phone fraud detection • Statistical profiling using histograms • Parametric statistical modeling • Neural networks, Rule-based systems Insider trading detection • Statistical profiling using histograms • Information theoretic Medical and public health • Parametric statistical modeling • Neural networks, Bayesian networks • Rule-based systems • Nearest neighbor techniques Fault detection in mechanical units • Parametric statistical modeling • Non-parametric statistical modeling • Neural networks, Spectral methods • Rule-based systems Structural damage detection • Statistical profiling using histograms • Parametric statistical modeling • Mixture of models, Neural networks Image processing, Surveillance • Mixture of models, Regression, SVM • Bayesian networks, Neural networks, • Clustering, Nearest neighbor methods Anomalous topic detection • Mixture of models, Neural networks • Statistical profiling using histograms • Clustering, SVM Anomaly detection in sensor networks • Parametric statistical modeling • Bayesian networks, Nearest neighbor • Rule-based systems, Spectral
  • 18. Poll question: At what stage is your organization in implementing anomaly detection techniques / solutions using advanced Data Science / Machine Learning / Real-time approaches? Stage 0: We do not have any plans yet, I am here for education Stage 1: We are at an initial planning stage Stage 2: Currently evaluating platforms/ implementation partners Stage 3: Implementation underway Stage 4: Already using a modern anomaly detection platform/ solution
  • 19. Key steps in anomaly detection • Problem identification and setting expectations • Defining the sources and schema • Parsing and pre-processing • Model development • Model execution • Investigation and feedback • Model updating • Operationalize model for scoring
  • 20. A modern platform approach to anomaly detection • Multi-tenancy • Rapidly develop and operationalize • Apply data science / machine learning techniques with real-time data • A-B testing • Easily scalable • Monitor, debug and diagnose at scale • Version management • Deployment workflow: Dev – Test – Prod
  • 21. Real-time Stream Processing and Machine Learning Platform ENABLING THE REAL-TIME ENTERPRISE
  • 22. Implementing credit card fraud detection in real-time using
  • 23. Schema overview { "isMerchantCompromised": 0, "isfraudent": true, "transactionAmount": 11276.0, "phone": "1478523699", "radiusFromResidence": 2.0, "deviation": 10.0, "averageTransaction": 4608.0, "city": 3, "transactionTime": "1512979321050", "email": "[email protected]", "name": "Jean", "gender": "Male", "merchantName": “My_Company", "timeOfDay": "10:30:19", "merchantCity": 10 }
  • 24. Build Apache Spark Applications Within Minutes https://www.streamanalytix.com/download
  • 25. Key takeaways • Modern data science techniques significantly improve detection of anomalies • It is possible to do it on streaming data in a scalable manner • Modern platforms can simplify implementation and reduce development cycle
  • 26. Thank you. Questions? © 2017 Impetus Technologies Email: [email protected] Twitter : @ImpetusTech / @StreamAnalytix