Outlier detection handling

zekeLabs
Outlier Detection & Handling
Learning made Simpler !
www.zekeLabs.com

Agenda
● Introduction
● Novelty detection
● Statistical Methods
● OneClassSVM
● Outlier Detection
● GMM
● Elliptical Envelop
● Isolation Forest
● Local Outlier Factor
● DBSCAN
● Handling Outlier Data

Introduction
● Many applications require being able to decide whether a new observation
belongs to the same distribution as existing observations (it is an inlier), or
should be considered as different (it is an outlier).
● Often, this ability is used to clean real data sets.
● Inliers are labeled 1, while outliers are labeled -1.

Novelty Detection
● Consider a data set of n observations from the same distribution
described by p features.
● Consider now that we add one more observation to that data set.
● Is the new observation so different from the others that we can doubt it is
regular?
● It is about to learn a rough, close frontier delimiting the contour of the
initial observations distribution, plotted in embedding p-dimensional
space.

Statistical Methods
● Z-score
● Plotting

Novelty Detection using OneClassSVM
● Training data is not polluted.
● One-class SVM is an unsupervised
algorithm that learns a decision
function for novelty detection:
classifying new data as similar or
different to the training set.

Outlier Detection
● Separate regular observation from the polluting ones.
● Three ways of doing outlier detection
Elliptic Envelope IsolationForest Local Outlier Factor

Elliptical Envelop
● One common way of performing outlier
detection is to assume that the regular
data come from a known distribution
(e.g. data are Gaussian distributed).
● It tries to define the “shape” of the data,
and can define outlying observations as
observations which stand far enough
from the fit shape.

Isolation Forest
● It’s an efficient way of performing
outlier detection in high-dimensional
datasets is to use random forests.
● Built on the basis of decision trees
● Outliers lie further away from regular
observation.
● Random partitioning produces
noticeably shorter paths for
anomalies.

Local Outlier Factor
● It measures the local density
deviation of a given data point with
respect to its neighbors.
● The idea is to detect the samples
that have a substantially lower
density than their neighbors.

Handling Outliers
● Manual Analysis
● Dropping them
● Generating alerts
● Creating new feature marking outliers

Clustering Method - DBSCAN
● A density based clustering method
● N is an outlier point that lies in no
cluster and it is not ‘density
reachable’ nor ‘density connected’
to any other point. Thus this point
will have “his own cluster”.

Visit : www.zekeLabs.com for more details
THANK YOU
Let us know how can we help your organization to Upskill the
employees to stay updated in the ever-evolving IT Industry.
Get in touch:
www.zekeLabs.com | +91-8095465880 | info@zekeLabs.com

Outlier detection handling

More Related Content

What's hot (20)

Similar to Outlier detection handling (20)

More from zekeLabs Technologies (20)

Recently uploaded (20)

Outlier detection handling