SlideShare a Scribd company logo
3
Most read
9
Most read
10
Most read
zekeLabs
Outlier Detection & Handling
Learning made Simpler !
www.zekeLabs.com
Agenda
● Introduction
● Novelty detection
● Statistical Methods
● OneClassSVM
● Outlier Detection
● GMM
● Elliptical Envelop
● Isolation Forest
● Local Outlier Factor
● DBSCAN
● Handling Outlier Data
Introduction
● Many applications require being able to decide whether a new observation
belongs to the same distribution as existing observations (it is an inlier), or
should be considered as different (it is an outlier).
● Often, this ability is used to clean real data sets.
● Inliers are labeled 1, while outliers are labeled -1.
Novelty Detection
● Consider a data set of n observations from the same distribution
described by p features.
● Consider now that we add one more observation to that data set.
● Is the new observation so different from the others that we can doubt it is
regular?
● It is about to learn a rough, close frontier delimiting the contour of the
initial observations distribution, plotted in embedding p-dimensional
space.
Statistical Methods
● Z-score
● Plotting
Novelty Detection using OneClassSVM
● Training data is not polluted.
● One-class SVM is an unsupervised
algorithm that learns a decision
function for novelty detection:
classifying new data as similar or
different to the training set.
Outlier Detection
● Separate regular observation from the polluting ones.
● Three ways of doing outlier detection
Elliptic Envelope IsolationForest Local Outlier Factor
Elliptical Envelop
● One common way of performing outlier
detection is to assume that the regular
data come from a known distribution
(e.g. data are Gaussian distributed).
● It tries to define the “shape” of the data,
and can define outlying observations as
observations which stand far enough
from the fit shape.
Isolation Forest
● It’s an efficient way of performing
outlier detection in high-dimensional
datasets is to use random forests.
● Built on the basis of decision trees
● Outliers lie further away from regular
observation.
● Random partitioning produces
noticeably shorter paths for
anomalies.
Local Outlier Factor
● It measures the local density
deviation of a given data point with
respect to its neighbors.
● The idea is to detect the samples
that have a substantially lower
density than their neighbors.
Handling Outliers
● Manual Analysis
● Dropping them
● Generating alerts
● Creating new feature marking outliers
Clustering Method - DBSCAN
● A density based clustering method
● N is an outlier point that lies in no
cluster and it is not ‘density
reachable’ nor ‘density connected’
to any other point. Thus this point
will have “his own cluster”.
Thank You !!!
Visit : www.zekeLabs.com for more details
THANK YOU
Let us know how can we help your organization to Upskill the
employees to stay updated in the ever-evolving IT Industry.
Get in touch:
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com

More Related Content

What's hot (20)

PDF
Modelling and evaluation
eShikshak
 
PPTX
Semi-Supervised Learning
Lukas Tencer
 
PDF
Outlier detection method introduction
DaeJin Kim
 
PPTX
Support vector machine
zekeLabs Technologies
 
PPTX
Lecture 6: Ensemble Methods
Marina Santini
 
PPTX
Curse of dimensionality
Nikhil Sharma
 
PPTX
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 
PPTX
Autoencoders in Deep Learning
milad abbasi
 
PPTX
Data preprocessing PPT
ANUSUYA T K
 
PPTX
Important Classification and Regression Metrics.pptx
Chode Amarnath
 
PDF
07 dimensionality reduction
Marco Quartulli
 
PPTX
Support Vector Machine ppt presentation
AyanaRukasar
 
PDF
Linear Regression vs Logistic Regression | Edureka
Edureka!
 
PDF
Neural Networks: Principal Component Analysis (PCA)
Mostafa G. M. Mostafa
 
PPTX
Anomaly detection
Dr. Stylianos Kampakis
 
PPTX
DBSCAN : A Clustering Algorithm
Pınar Yahşi
 
PPT
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Salah Amean
 
PDF
An Introduction to Anomaly Detection
Kenneth Graham
 
PDF
Variational Autoencoders For Image Generation
Jason Anderson
 
PDF
Bias and variance trade off
VARUN KUMAR
 
Modelling and evaluation
eShikshak
 
Semi-Supervised Learning
Lukas Tencer
 
Outlier detection method introduction
DaeJin Kim
 
Support vector machine
zekeLabs Technologies
 
Lecture 6: Ensemble Methods
Marina Santini
 
Curse of dimensionality
Nikhil Sharma
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 
Autoencoders in Deep Learning
milad abbasi
 
Data preprocessing PPT
ANUSUYA T K
 
Important Classification and Regression Metrics.pptx
Chode Amarnath
 
07 dimensionality reduction
Marco Quartulli
 
Support Vector Machine ppt presentation
AyanaRukasar
 
Linear Regression vs Logistic Regression | Edureka
Edureka!
 
Neural Networks: Principal Component Analysis (PCA)
Mostafa G. M. Mostafa
 
Anomaly detection
Dr. Stylianos Kampakis
 
DBSCAN : A Clustering Algorithm
Pınar Yahşi
 
Data mining: Concepts and Techniques, Chapter12 outlier Analysis
Salah Amean
 
An Introduction to Anomaly Detection
Kenneth Graham
 
Variational Autoencoders For Image Generation
Jason Anderson
 
Bias and variance trade off
VARUN KUMAR
 

Similar to Outlier detection handling (20)

PPT
Chap10 Anomaly Detection
guest76d673
 
DOCX
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
randyburney60861
 
PPT
Data cleaning-outlier-detection
Chathurangi Shyalika
 
PDF
Outlier Detection Using Unsupervised Learning on High Dimensional Data
IJERA Editor
 
PDF
Data wrangling week 10
Ferdin Joe John Joseph PhD
 
PPTX
Outlier analysis,Chapter-12, Data Mining: Concepts and Techniques
Ashikur Rahman
 
PDF
Introduction to unsupervised learning: outlier detection
Joseph Itopa Abubakar
 
PPTX
Anomaly Detection
DataminingTools Inc
 
PPTX
Anomaly Detection
guest0edcaf
 
PPTX
Anomaly Detection
Datamining Tools
 
PPT
Chapter 12. Outlier Detection.ppt
Subrata Kumer Paul
 
PPT
12Outlier.for software introductionalism
faiziikanwal47
 
PDF
12 outlier
JoonyoungJayGwak
 
PPT
data engineering topic on cluster analysis
DwarakacharlaTarun
 
PPTX
PyGotham 2016
Manojit Nandi
 
PPT
Chapter 12 outlier
Houw Liong The
 
PDF
angle based outlier de
Kruthikka Palraj
 
PDF
Kdd08 abod
Kruthikka Palraj
 
PDF
Outlier Detection Approaches in Data Mining
IRJET Journal
 
PPT
3.7 outlier analysis
Krish_ver2
 
Chap10 Anomaly Detection
guest76d673
 
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
randyburney60861
 
Data cleaning-outlier-detection
Chathurangi Shyalika
 
Outlier Detection Using Unsupervised Learning on High Dimensional Data
IJERA Editor
 
Data wrangling week 10
Ferdin Joe John Joseph PhD
 
Outlier analysis,Chapter-12, Data Mining: Concepts and Techniques
Ashikur Rahman
 
Introduction to unsupervised learning: outlier detection
Joseph Itopa Abubakar
 
Anomaly Detection
DataminingTools Inc
 
Anomaly Detection
guest0edcaf
 
Anomaly Detection
Datamining Tools
 
Chapter 12. Outlier Detection.ppt
Subrata Kumer Paul
 
12Outlier.for software introductionalism
faiziikanwal47
 
12 outlier
JoonyoungJayGwak
 
data engineering topic on cluster analysis
DwarakacharlaTarun
 
PyGotham 2016
Manojit Nandi
 
Chapter 12 outlier
Houw Liong The
 
angle based outlier de
Kruthikka Palraj
 
Kdd08 abod
Kruthikka Palraj
 
Outlier Detection Approaches in Data Mining
IRJET Journal
 
3.7 outlier analysis
Krish_ver2
 
Ad

More from zekeLabs Technologies (20)

PPTX
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
zekeLabs Technologies
 
PPTX
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
zekeLabs Technologies
 
PDF
[Webinar] Following the Agile Footprint - zekeLabs
zekeLabs Technologies
 
PPTX
Machine learning at scale - Webinar By zekeLabs
zekeLabs Technologies
 
PDF
A curtain-raiser to the container world Docker & Kubernetes
zekeLabs Technologies
 
PPTX
Docker - A curtain raiser to the Container world
zekeLabs Technologies
 
PPTX
Serverless and cloud computing
zekeLabs Technologies
 
PPTX
02 terraform core concepts
zekeLabs Technologies
 
PPTX
08 Terraform: Provisioners
zekeLabs Technologies
 
PPTX
Nearest neighbors
zekeLabs Technologies
 
PPTX
Naive bayes
zekeLabs Technologies
 
PPTX
Master guide to become a data scientist
zekeLabs Technologies
 
PPTX
Linear regression
zekeLabs Technologies
 
PPTX
Linear models of classification
zekeLabs Technologies
 
PPTX
Grid search, pipeline, featureunion
zekeLabs Technologies
 
PPTX
Feature selection
zekeLabs Technologies
 
PPTX
Essential NumPy
zekeLabs Technologies
 
PPTX
Dimentionality reduction
zekeLabs Technologies
 
PPTX
Data Preprocessing
zekeLabs Technologies
 
Webinar - Build Cloud-native platform using Docker, Kubernetes, Prometheus, I...
zekeLabs Technologies
 
Design Patterns for Pods and Containers in Kubernetes - Webinar by zekeLabs
zekeLabs Technologies
 
[Webinar] Following the Agile Footprint - zekeLabs
zekeLabs Technologies
 
Machine learning at scale - Webinar By zekeLabs
zekeLabs Technologies
 
A curtain-raiser to the container world Docker & Kubernetes
zekeLabs Technologies
 
Docker - A curtain raiser to the Container world
zekeLabs Technologies
 
Serverless and cloud computing
zekeLabs Technologies
 
02 terraform core concepts
zekeLabs Technologies
 
08 Terraform: Provisioners
zekeLabs Technologies
 
Nearest neighbors
zekeLabs Technologies
 
Master guide to become a data scientist
zekeLabs Technologies
 
Linear regression
zekeLabs Technologies
 
Linear models of classification
zekeLabs Technologies
 
Grid search, pipeline, featureunion
zekeLabs Technologies
 
Feature selection
zekeLabs Technologies
 
Essential NumPy
zekeLabs Technologies
 
Dimentionality reduction
zekeLabs Technologies
 
Data Preprocessing
zekeLabs Technologies
 
Ad

Recently uploaded (20)

PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Learn Computer Forensics, Second Edition
AnuraShantha7
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Learn Computer Forensics, Second Edition
AnuraShantha7
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 

Outlier detection handling

  • 1. zekeLabs Outlier Detection & Handling Learning made Simpler ! www.zekeLabs.com
  • 2. Agenda ● Introduction ● Novelty detection ● Statistical Methods ● OneClassSVM ● Outlier Detection ● GMM ● Elliptical Envelop ● Isolation Forest ● Local Outlier Factor ● DBSCAN ● Handling Outlier Data
  • 3. Introduction ● Many applications require being able to decide whether a new observation belongs to the same distribution as existing observations (it is an inlier), or should be considered as different (it is an outlier). ● Often, this ability is used to clean real data sets. ● Inliers are labeled 1, while outliers are labeled -1.
  • 4. Novelty Detection ● Consider a data set of n observations from the same distribution described by p features. ● Consider now that we add one more observation to that data set. ● Is the new observation so different from the others that we can doubt it is regular? ● It is about to learn a rough, close frontier delimiting the contour of the initial observations distribution, plotted in embedding p-dimensional space.
  • 6. Novelty Detection using OneClassSVM ● Training data is not polluted. ● One-class SVM is an unsupervised algorithm that learns a decision function for novelty detection: classifying new data as similar or different to the training set.
  • 7. Outlier Detection ● Separate regular observation from the polluting ones. ● Three ways of doing outlier detection Elliptic Envelope IsolationForest Local Outlier Factor
  • 8. Elliptical Envelop ● One common way of performing outlier detection is to assume that the regular data come from a known distribution (e.g. data are Gaussian distributed). ● It tries to define the “shape” of the data, and can define outlying observations as observations which stand far enough from the fit shape.
  • 9. Isolation Forest ● It’s an efficient way of performing outlier detection in high-dimensional datasets is to use random forests. ● Built on the basis of decision trees ● Outliers lie further away from regular observation. ● Random partitioning produces noticeably shorter paths for anomalies.
  • 10. Local Outlier Factor ● It measures the local density deviation of a given data point with respect to its neighbors. ● The idea is to detect the samples that have a substantially lower density than their neighbors.
  • 11. Handling Outliers ● Manual Analysis ● Dropping them ● Generating alerts ● Creating new feature marking outliers
  • 12. Clustering Method - DBSCAN ● A density based clustering method ● N is an outlier point that lies in no cluster and it is not ‘density reachable’ nor ‘density connected’ to any other point. Thus this point will have “his own cluster”.
  • 14. Visit : www.zekeLabs.com for more details THANK YOU Let us know how can we help your organization to Upskill the employees to stay updated in the ever-evolving IT Industry. Get in touch: www.zekeLabs.com | +91-8095465880 | [email protected]