SlideShare a Scribd company logo
(Paper Presentation)
OPTICS-Ordering Points To Identify The
Clustering Structure
Presenter
Anu Singha
Asiya Naz
Rajesh Piryani
South Asian University
OUTLINE
 Introduction
 Definition (Directly Density Reachable, Density Reachable, Density
Connected,
 OPTICS Algorithm
 Example
 Graphical Results
April 30,2012 2
CLUSTERING
 Goal
 Group objects into meaningful subclasses as part of an exploratory process
to insight into data or as a preprocessing step for other algorithms.
 Clustering Strategies
 Hierarchical
 Partitioning
 k-means
 Density Based
April 30,2012 3
DENSITY BASED CLUSTERING
 Density-based Clustering locates regions of high density that are separated from
one another by regions of low density.
 Density = number of points within a specified radius (Eps)
April 30,2012 4
DENSITY BASED CLUSTERING
Flat Clustering
one level of clusters
Hierarchical Clustering
nested clusters
e.g. density-based clustering algorithm
DBSCAN [KDD 96]
e.g. density-based clustering algorithm
OPTICS [SIGMOD 99]
April 30,2012 5
INTRODUCTION
 DBSCAN can cluster objects given input parameters such as
 (Eps) (the maximum radius of a neighborhood) and
 MinPts (the minimum number of points required in the neighborhood of a
core object),
 it encumbers users with the responsibility of selecting parameter values that will
lead to the discovery of acceptable clusters.
 Such parameter settings are usually empirically set and difficult to determine.
 Moreover, real-world, high-dimensional data sets often have very skewed
distributions such that their intrinsic clustering structure may not be well
characterized by a single set of global density parameters.
April 30,2012 6
INTRODUCTION
 density-based clusters are monotonic with respect to the neighborhood
threshold.
 In DBSCAN, for a fixed MinPts value and two neighborhood thresholds,
 (Eps) 1 < (Eps) 2, a cluster C with respect to (Eps)1 and
 MinPts must be a subset of a cluster C’ with respect to (Eps) 2 and MinPts.
 This means that if two objects are in a density-based cluster, they must also be
in a cluster with a lower density requirement.
 Different clusters may have very different densities
 Clusters may be in hierarchies
April 30,2012 7
To overcome the difficulty in using one set of global parameters in clustering analysis, a
cluster analysis method called OPTICS was proposed.
OPTICS
 in figure 3, where
 C1 and C2 are density-based clusters
with respect to e2 < e1
 and C is a density based cluster with
respect to e1 completely containing the
sets C1 and C2.
 for a constant MinPts-value, density-
based clusters with respect to a higher
density (i.e. a lower value for e) are
completely contained in density-
connected sets with respect to a lower
density (i.e. a higher value for e).
April 30,2012 8
OPTICS
 To produce a consistent result
 obey a specific order in which objects are processed when expanding a cluster.
 select an object which is density-reachable with respect to the lowest ε value
 to guarantee that clusters w.r.t higher density (i.e. smaller e values) are finished first.
 OPTICS works in principle like such an extended DBSCAN algorithm for an
infinite number of distance parameters εi which are smaller than a “generating
distance” ε (i.e. 0 ≤ εi ≤ ε).
 The only difference is that we do not assign cluster memberships.
 Instead, we store the order in which the objects are processed and the
information which would be used by an extended DBSCAN algorithm to assign
cluster memberships if this were at all possible for an infinite number of
parameters).
April 30,2012 9
OPTICS
 OPTICS does not explicitly produce a data set clustering.
 It outputs a cluster ordering.
 It is linear list of all objects under analysis and
 represents the density-based clustering structure of the data.
 Objects in a denser cluster are listed closer to each other in the cluster ordering.
 Ordering is equivalent to density-based clustering obtained from a wide range of
parameter settings.
 Thus, OPTICS does not require the user to provide a specific density threshold.
 The cluster ordering can be used to extract basic clustering information (e.g.,
cluster centers, or arbitrary-shaped clusters), derive the intrinsic clustering
structure, as well as provide a visualization of the clustering
April 30,2012 10
OPTICS (CONTINUED..)
 To construct the different clusterings simultaneously, the objects are processed
in a specific order.
 This order selects an object that is density-reachable with respect to the lowest
(Eps) value so that clusters with higher density (lower (Eps)) will be finished
first.
 Based on this idea, OPTICS needs two important pieces of information per
object:
 Core Distance
 Reachability Distance
April 30,2012 11
It was presented by Mihael Ankerst, Markus M. Breunig,Hans-Peter
Kriegel and Jörg Sander.
TERMINOLOGIES
 ε-Neighborhood
 Objects within a radius of ε from an object. (epsilon-neighborhood)
 Core objects
 ε-Neighborhood of an object contains at least MinPts of objects
April 30,2012 12
q p
εε
ε-Neighborhood of p
ε-Neighborhood of q
p is a core object (MinPts = 4)
q is not a core object
TERMINOLOGIES
 Directly Density Reachable
 An object q is directly density-reachable from object p if q is within the ε-
Neighborhood of p and p is a core object
April 30,2012 13
q p
εε
 q is directly density-reachable from p
 p is not directly density- reachable from q?
TERMINOLOGIES
 Density Reachable
 An object p is density-reachable from q w.r.t ε and MinPts if there is a
chain of objects p1,…,pn, with p1=q, pn=p such that pi+1is directly density-
reachable from pi w.r.t ε and MinPts for all 1 <= i <= n
April 30,2012 14
p
 q is density-reachable from p
 p is not density- reachable from q>
 Transitive closure of direct density-Reachability,
asymmetric
q
TERMINOLOGIES
 Definition: core-distance
 Definition: reachability-distance






otherwise)(dist
|),(rangeQuery|if
)(distcore ,
oMinPts
MinPtso
oMinPts


reach dist ( , ) max(core dist ( ),dist( , )), ,   MinPts MinPtsp o o p o
core-distance(o)
o
reachability-distance(p,o)
p
p
reachability-distance(p,o)

MinPts = 5
April 30,2012 15
ABOUT OPTICS COMPUTATION
 It computes an ordering of all objects in a given database. And
 It stores the core-distance and a suitable reachability-distance for each object
in the database.
 OPTICS maintains a list called OrderSeeds to generate the output ordering.
 Objects in OrderSeeds
 are sorted by the reachability-distance from their respective closest core
objects,
 that is, by the smallest reachability-distance of each object.
April 30,2012 16
ABOUT OPTICS ALGORITHM
 Begin with an arbitrary object from the input database as the current object, p.
 It retrieves the ε-neighborhood of p, determines the core-distance, and sets
the reachability-distance to undefined.
 The current object, p, is then written to output.
 If p is not a core object,
 OPTICS simply moves on to the next object in the OrderSeeds list (or the
input database if OrderSeeds is empty).
April 30,2012 17
ABOUT OPTICS ALGORITHM
 If p is a core object,
 then for each object, q, in the ε-neighborhood of p,
 OPTICS updates its reachability-distance from p
 and inserts q into OrderSeeds if q has not yet been processed.
 The iteration continues until the input is fully consumed and OrderSeeds is
empty.
April 30,2012 18
ALGORITHM
 OPTICS (SetOfObjects, e, MinPts, OrderedFile)
 OrderedFile.open();
 FOR i FROM 1 TO SetOfObjects.size DO
 Object := SetOfObjects.get(i);
 IF NOT Object.Processed THEN
 ExpandClusterOrder(SetOfObjects, Object, e, MinPts,
OrderedFile)
 OrderedFile.close();
 END; // OPTICS
April 30,2012 19
PROCEDURE FOR
ExpandClusterOrder
 ExpandClusterOrder(SetOfObjects, Object, ε, MinPts, OrderedFile);
 neighbors := SetOfObjects.neighbors(Object, ε);
 Object.Processed := TRUE;
 Object.reachability_distance := UNDEFINED;
 Object.setCoreDistance(neighbors, ε, MinPts);
 OrderedFile.write(Object);
 IF Object.core_distance <> UNDEFINED THEN
 OrderSeeds.update(neighbors, Object);
 WHILE NOT OrderSeeds.empty() DO
 currentObject := OrderSeeds.next();
 neighbors:=SetOfObjects.neighbors(currentObject, ε);
 currentObject.Processed := TRUE;
 currentObject.setCoreDistance(neighbors, ε, MinPts);
 OrderedFile.write(currentObject);
 IF currentObject.core_distance<>UNDEFINED THEN
 OrderSeeds.update(neighbors, currentObject);
 END; // ExpandClusterOrder
April 30,2012 20
object is simply written to the file OrderedFile with its coredistance and its
current reachability-distance.
OrderSeeds::update()
 OrderSeeds::update(neighbors, CenterObject);
 c_dist := CenterObject.core_distance;
 FORALL Object FROM neighbors DO
 IF NOT Object.Processed THEN
 new_r_dist:=max(c_dist,CenterObject.dist(Object));
 IF Object.reachability_distance=UNDEFINED THEN
 Object.reachability_distance := new_r_dist;
 insert(Object, new_r_dist);
 ELSE // Object already in OrderSeeds
 IF new_r_dist<Object.reachability_distance THEN
 Object.reachability_distance := new_r_dist;
 decrease(Object, new_r_dist);
 END; // OrderSeeds::update
April 30,2012 21
 Having generated the augmented cluster-ordering of a database with respect to e
and MinPts,
 extract any density-based clustering from this order with respect to MinPts and a
clustering- distance ε ’ ≤ε
 by simply “scanning” the cluster-ordering
 and assigning cluster-memberships depending on the reachability- distance and the core-
distance of the objects.
 That an extraction is possible only demonstrates that the cluster-ordering of a
data set actually contains the information about the intrinsic clustering structure
of that data set (up to the generating distance ε) .
April 30,2012 22
ExtractDBSCAN-Clustering
(ClusterOrderedObjs, ε’, MinPts)
 ExtractDBSCAN-Clustering (ClusterOrderedObjs, ε’, MinPts)
 // Precondition: ε ' ≤ generating dist ε for ClusterOrderedObjs
 ClusterId := NOISE;
 FOR i FROM 1 TO ClusterOrderedObjs.size DO
 Object := ClusterOrderedObjs.get(i);
 IF Object.reachability_distance > ε’ THEN
 // UNDEFINED > ε
 IF Object.core_distance ≤ ε’ THEN
 ClusterId := nextId(ClusterId);
 Object.clusterId := ClusterId;
 ELSE
 Object.clusterId := NOISE;
 ELSE // Object.reachability_distance ≤ ε’
 Object.clusterId := ClusterId;
 END; // ExtractDBSCAN-Clustering
April 30,2012 23
OPTICS ALGORITHM EXAMPLE
A I
B
J
K
L
R
M
P
N
C
F
D
E
G H
44

reach
seedlist:
• Example Database (2-dimensional, 16 points)
• ε= 44, MinPts = 3
April 30,2012 24
OPTICS ALGORITHM EXAMPLE
A
I
B
J
K
L
R
M
P
N
C
F
D
E
G
H
44

reach
seedlist:
A
I
B
J
K
L
R
M
P
N
C
F
D
E
G
H
A
44


core-
distance
(B,40) (I, 40)
• Example Database (2-dimensional, 16 points)
• ε= 44, MinPts = 3
April 30,2012 25
OPTICS ALGORITHM EXAMPLE
44

reach
A
44

B
A
I
B
J
K
L
R
M
P
N
C
F
D
E
G
H
seedlist: (I, 40) (C, 40)
• Example Database (2-dimensional, 16 points)
• ε= 44, MinPts = 3
April 30,2012 26
OPTICS ALGORITHM EXAMPLE
44

reach
A
44

B
A
I
B
J
K
L
R
M
P
N
C
F
D
E
G
H
I
seedlist: (J, 20) (K, 20) (L, 31) (C, 40) (M, 40) (R, 43)
• Example Database (2-dimensional, 16 points)
• ε= 44, MinPts = 3
April 30,2012 27
OPTICS ALGORITHM EXAMPLE
44

reach
A
44

B I
A
I
B
J
K
L
R
M
P
N
C
F
D
E
G
H
J
seedlist: (L, 19) (K, 20) (R, 21) (M, 30) (P, 31) (C, 40)
• Example Database (2-dimensional, 16 points)
• ε= 44, MinPts = 3
April 30,2012 28
OPTICS ALGORITHM EXAMPLE
44

reach
A
44

B I J
A
I
B
J
K
L
R
M
P
N
C
F
D
E
G
H
L
…
seedlist: (M, 18) (K, 18) (R, 20) (P, 21) (N, 35) (C, 40)
• Example Database (2-dimensional, 16 points)
• ε= 44, MinPts = 3
April 30,2012 29
OPTICS ALGORITHM EXAMPLE
A
I
B
J
K
L
R
M
P
N
C
F
D
E G
H
seedlist: -
A B I J L M K N R P C D F G E H
44
reach

• Example Database (2-dimensional, 16 points)
• ε= 44, MinPts = 3
April 30,2012 30
OPTICS ALGORITHM EXAMPLE
A
I
B
J
K
L
R
M
P
N
C
F
D
E
G
H
seedlist: -
A B I J L M K N R P C D F G E H
44
reach

• Example Database (2-dimensional, 16 points)
• ε= 44, MinPts = 3
April 30,2012 31
GRAPHICAL REPRESENTATION
 A data set’s cluster ordering can be represented graphically.
 It helps to visualize and understand the clustering structure in a data set.
April 30,2012 32
GRAPHICAL REPRESENTATION
 In Figure
 reachability plot for a simple 2-D data set, which presents a general
overview of how the data are structured and clustered.
 The data objects are plotted in the clustering order (horizontal axis) together
with their respective reachability-distances (vertical axis).
 The three Gaussian “bumps” in the plot reflect three clusters in the data set.
April 30,2012 33
ALGORITHM PERFROMANCE
 performed an extensive performance test using different data sets and different
parameter settings.
 simply turned out that the run-time of OPTICS was almost constantly 1.6 times
the run-time of DBSCAN.
 not surprising since the run-time for OPTICS as well as for DBSCAN is heavily
dominated by
 the run-time of the ε -neighborhood queries
 which must be performed for each object in the database, i.e. the run-time
for both algorithms is O(n * run-time of an e-neighborhood query).
April 30,2012 34
ALGORITHM PERFROMANCE
 To retrieve the e-neighborhood of an object o, a region query with the center o
and the radius e is used.
 Without any index support, to answer such a region query, a scan through the
whole database has to be performed.
 In this case, the run-time of OPTICS would be O(n2).
 If a tree-based spatial index can be used, the run-time is reduced to O (n log n)
April 30,2012 35
ALGORITHM PERFROMANCE
 The height of such a tree-based index is O(log n) for a database of n objects in
the worst case and, at least in low-dimensional spaces, a query with a “small”
query region has to traverse only a limited number of paths.
 Furthermore, if we have a direct access to the e-neighborhood, e.g. if the objects
are organized in a grid, the run-time is further reduced to O(n) because in a grid
the complexity of a single neighborhood query is O(1).
April 30,2012 36
CONCLUSION
 OPTICS computes an augmented cluster- ordering of the database objects.
 The main advantage of approach, when compared to the clustering algorithms
proposed in the literature, is that, do not limit to one global parameter setting.
 Instead, the augmented cluster-ordering contains information which is
equivalent to the density based clusterings corresponding to a broad range of
parameter settings and thus is a versatile basis for both automatic and interactive
cluster analysis.
April 30,2012 37
REFERENCES
 [1] Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jörg Sander,
“OPTICS: Ordering Points To Identify the Clustering Structure” , Proc. ACM
SIGMOD’99 Int. Conf. on Management of Data, Philadelphia PA, 1999.
 [2] Data Mining Concepts and Techniques by Han Kamber Pei , Third Edition
 [3] Stefan Brecheisen, Hans-Peter Kriegel, Martin Pfeifle, “Efficient Density-
Based Clustering of Complex Objects“
 [4] Class Lecture Slides about Density Clustering -DBSCAN
April 30,2012 38
THANK YOU
FOR YOUR CO-OPERATION
April 30,2012 39
QUESTIONS??
April 30,2012 40

More Related Content

What's hot (20)

PPTX
DBSCAN (2014_11_25 06_21_12 UTC)
Cory Cook
 
PPTX
DBSCAN : A Clustering Algorithm
Pınar Yahşi
 
PDF
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
PPTX
Density based clustering
YaswanthHariKumarVud
 
PPT
3.5 model based clustering
Krish_ver2
 
PDF
Data Mining: Association Rules Basics
Benazir Income Support Program (BISP)
 
PPTX
Data Mining: clustering and analysis
DataminingTools Inc
 
PPTX
K MEANS CLUSTERING
singh7599
 
PPT
K means Clustering Algorithm
Kasun Ranga Wijeweera
 
PDF
Performance Metrics for Machine Learning Algorithms
Kush Kulshrestha
 
PDF
Cluster analysis
Hohai university
 
PPTX
Forward and Backward chaining in AI
Megha Sharma
 
PPTX
K-Nearest Neighbor Classifier
Neha Kulkarni
 
PPT
2.4 rule based classification
Krish_ver2
 
PPTX
Machine learning clustering
CosmoAIMS Bassett
 
PPTX
Random forest algorithm
Rashid Ansari
 
PPTX
K means clustering
keshav goyal
 
PPTX
Introduction to Clustering algorithm
hadifar
 
PPT
3.7 outlier analysis
Krish_ver2
 
DBSCAN (2014_11_25 06_21_12 UTC)
Cory Cook
 
DBSCAN : A Clustering Algorithm
Pınar Yahşi
 
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Density based clustering
YaswanthHariKumarVud
 
3.5 model based clustering
Krish_ver2
 
Data Mining: Association Rules Basics
Benazir Income Support Program (BISP)
 
Data Mining: clustering and analysis
DataminingTools Inc
 
K MEANS CLUSTERING
singh7599
 
K means Clustering Algorithm
Kasun Ranga Wijeweera
 
Performance Metrics for Machine Learning Algorithms
Kush Kulshrestha
 
Cluster analysis
Hohai university
 
Forward and Backward chaining in AI
Megha Sharma
 
K-Nearest Neighbor Classifier
Neha Kulkarni
 
2.4 rule based classification
Krish_ver2
 
Machine learning clustering
CosmoAIMS Bassett
 
Random forest algorithm
Rashid Ansari
 
K means clustering
keshav goyal
 
Introduction to Clustering algorithm
hadifar
 
3.7 outlier analysis
Krish_ver2
 

Similar to Optics ordering points to identify the clustering structure (20)

PPTX
Optics
RohitPaul52
 
PPTX
density based method and expectation maximization
Siva Priya
 
PPT
3.4 density and grid methods
Krish_ver2
 
PPTX
Fa18_P2.pptx
Md Abul Hayat
 
PDF
clustering density technidques in machine learning
ShymaPV
 
PDF
50120140501016
IAEME Publication
 
PDF
DBSCAN
ssuseraef7e0
 
PPTX
Could a Data Science Program use Data Science Insights?
Zachary Thomas
 
PPTX
Cluster Analysis.pptx
AdityaRajput317826
 
PPTX
Graph and Density Based Clustering
AyushAnand105
 
PPT
cluster analysis
sudesh regmi
 
PPTX
Data Mining Lecture_7.pptx
Subrata Kumer Paul
 
PPTX
Data mining Techniques
Sulman Ahmed
 
PDF
6 clustering
Viet-Trung TRAN
 
PPTX
Fast Single-pass K-means Clusterting at Oxford
MapR Technologies
 
PPTX
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
NANDHINIS900805
 
PDF
CPSC 340: Machine Learning and Data Mining More Clustering Andreas Lehrmann a...
gazmend16
 
PPTX
Dbscan
RohitPaul52
 
PPTX
Pattern recognition binoy k means clustering
108kaushik
 
Optics
RohitPaul52
 
density based method and expectation maximization
Siva Priya
 
3.4 density and grid methods
Krish_ver2
 
Fa18_P2.pptx
Md Abul Hayat
 
clustering density technidques in machine learning
ShymaPV
 
50120140501016
IAEME Publication
 
DBSCAN
ssuseraef7e0
 
Could a Data Science Program use Data Science Insights?
Zachary Thomas
 
Cluster Analysis.pptx
AdityaRajput317826
 
Graph and Density Based Clustering
AyushAnand105
 
cluster analysis
sudesh regmi
 
Data Mining Lecture_7.pptx
Subrata Kumer Paul
 
Data mining Techniques
Sulman Ahmed
 
6 clustering
Viet-Trung TRAN
 
Fast Single-pass K-means Clusterting at Oxford
MapR Technologies
 
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
NANDHINIS900805
 
CPSC 340: Machine Learning and Data Mining More Clustering Andreas Lehrmann a...
gazmend16
 
Dbscan
RohitPaul52
 
Pattern recognition binoy k means clustering
108kaushik
 
Ad

More from Rajesh Piryani (11)

PDF
Introduction to sentiment analysis
Rajesh Piryani
 
PDF
Gomory's cutting plane method
Rajesh Piryani
 
PDF
Monte carlo simulation
Rajesh Piryani
 
PPSX
Online Advertisements and the AdWords Problem
Rajesh Piryani
 
PDF
Hadoop
Rajesh Piryani
 
PDF
Tqm metrics
Rajesh Piryani
 
PPTX
(Project) Student grading system
Rajesh Piryani
 
PDF
Agile software development
Rajesh Piryani
 
PDF
(Paper Presentation) DSDV
Rajesh Piryani
 
PDF
(Paper Presentation) ZIGZAG: An Efficient Peer-to-Peer Scheme for Media Strea...
Rajesh Piryani
 
PDF
Address Binding Scheme
Rajesh Piryani
 
Introduction to sentiment analysis
Rajesh Piryani
 
Gomory's cutting plane method
Rajesh Piryani
 
Monte carlo simulation
Rajesh Piryani
 
Online Advertisements and the AdWords Problem
Rajesh Piryani
 
Tqm metrics
Rajesh Piryani
 
(Project) Student grading system
Rajesh Piryani
 
Agile software development
Rajesh Piryani
 
(Paper Presentation) DSDV
Rajesh Piryani
 
(Paper Presentation) ZIGZAG: An Efficient Peer-to-Peer Scheme for Media Strea...
Rajesh Piryani
 
Address Binding Scheme
Rajesh Piryani
 
Ad

Recently uploaded (20)

PDF
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
PDF
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
PPTX
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
PDF
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
PPT
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
PPTX
How to Create a PDF Report in Odoo 18 - Odoo Slides
Celine George
 
PDF
Lesson 2 - WATER,pH, BUFFERS, AND ACID-BASE.pdf
marvinnbustamante1
 
PPTX
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
PDF
LAW OF CONTRACT (5 YEAR LLB & UNITARY LLB )- MODULE - 1.& 2 - LEARN THROUGH P...
APARNA T SHAIL KUMAR
 
PPTX
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
PDF
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
PPTX
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
PDF
The Constitution Review Committee (CRC) has released an updated schedule for ...
nservice241
 
PPTX
HYDROCEPHALUS: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
PDF
Isharyanti-2025-Cross Language Communication in Indonesian Language
Neny Isharyanti
 
PPTX
STAFF DEVELOPMENT AND WELFARE: MANAGEMENT
PRADEEP ABOTHU
 
PDF
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
PPTX
SPINA BIFIDA: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
PPTX
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
PDF
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 
Generative AI: it's STILL not a robot (CIJ Summer 2025)
Paul Bradshaw
 
DIGESTION OF CARBOHYDRATES,PROTEINS,LIPIDS
raviralanaresh2
 
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
ARAL-Orientation_Morning-Session_Day-11.pdf
JoelVilloso1
 
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
How to Create a PDF Report in Odoo 18 - Odoo Slides
Celine George
 
Lesson 2 - WATER,pH, BUFFERS, AND ACID-BASE.pdf
marvinnbustamante1
 
MENINGITIS: NURSING MANAGEMENT, BACTERIAL MENINGITIS, VIRAL MENINGITIS.pptx
PRADEEP ABOTHU
 
LAW OF CONTRACT (5 YEAR LLB & UNITARY LLB )- MODULE - 1.& 2 - LEARN THROUGH P...
APARNA T SHAIL KUMAR
 
Growth and development and milestones, factors
BHUVANESHWARI BADIGER
 
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
Stereochemistry-Optical Isomerism in organic compoundsptx
Tarannum Nadaf-Mansuri
 
The Constitution Review Committee (CRC) has released an updated schedule for ...
nservice241
 
HYDROCEPHALUS: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
Isharyanti-2025-Cross Language Communication in Indonesian Language
Neny Isharyanti
 
STAFF DEVELOPMENT AND WELFARE: MANAGEMENT
PRADEEP ABOTHU
 
ARAL_Orientation_Day-2-Sessions_ARAL-Readung ARAL-Mathematics ARAL-Sciencev2.pdf
JoelVilloso1
 
SPINA BIFIDA: NURSING MANAGEMENT .pptx
PRADEEP ABOTHU
 
ASRB NET 2023 PREVIOUS YEAR QUESTION PAPER GENETICS AND PLANT BREEDING BY SAT...
Krashi Coaching
 
Knee Extensor Mechanism Injuries - Orthopedic Radiologic Imaging
Sean M. Fox
 

Optics ordering points to identify the clustering structure

  • 1. (Paper Presentation) OPTICS-Ordering Points To Identify The Clustering Structure Presenter Anu Singha Asiya Naz Rajesh Piryani South Asian University
  • 2. OUTLINE  Introduction  Definition (Directly Density Reachable, Density Reachable, Density Connected,  OPTICS Algorithm  Example  Graphical Results April 30,2012 2
  • 3. CLUSTERING  Goal  Group objects into meaningful subclasses as part of an exploratory process to insight into data or as a preprocessing step for other algorithms.  Clustering Strategies  Hierarchical  Partitioning  k-means  Density Based April 30,2012 3
  • 4. DENSITY BASED CLUSTERING  Density-based Clustering locates regions of high density that are separated from one another by regions of low density.  Density = number of points within a specified radius (Eps) April 30,2012 4
  • 5. DENSITY BASED CLUSTERING Flat Clustering one level of clusters Hierarchical Clustering nested clusters e.g. density-based clustering algorithm DBSCAN [KDD 96] e.g. density-based clustering algorithm OPTICS [SIGMOD 99] April 30,2012 5
  • 6. INTRODUCTION  DBSCAN can cluster objects given input parameters such as  (Eps) (the maximum radius of a neighborhood) and  MinPts (the minimum number of points required in the neighborhood of a core object),  it encumbers users with the responsibility of selecting parameter values that will lead to the discovery of acceptable clusters.  Such parameter settings are usually empirically set and difficult to determine.  Moreover, real-world, high-dimensional data sets often have very skewed distributions such that their intrinsic clustering structure may not be well characterized by a single set of global density parameters. April 30,2012 6
  • 7. INTRODUCTION  density-based clusters are monotonic with respect to the neighborhood threshold.  In DBSCAN, for a fixed MinPts value and two neighborhood thresholds,  (Eps) 1 < (Eps) 2, a cluster C with respect to (Eps)1 and  MinPts must be a subset of a cluster C’ with respect to (Eps) 2 and MinPts.  This means that if two objects are in a density-based cluster, they must also be in a cluster with a lower density requirement.  Different clusters may have very different densities  Clusters may be in hierarchies April 30,2012 7 To overcome the difficulty in using one set of global parameters in clustering analysis, a cluster analysis method called OPTICS was proposed.
  • 8. OPTICS  in figure 3, where  C1 and C2 are density-based clusters with respect to e2 < e1  and C is a density based cluster with respect to e1 completely containing the sets C1 and C2.  for a constant MinPts-value, density- based clusters with respect to a higher density (i.e. a lower value for e) are completely contained in density- connected sets with respect to a lower density (i.e. a higher value for e). April 30,2012 8
  • 9. OPTICS  To produce a consistent result  obey a specific order in which objects are processed when expanding a cluster.  select an object which is density-reachable with respect to the lowest ε value  to guarantee that clusters w.r.t higher density (i.e. smaller e values) are finished first.  OPTICS works in principle like such an extended DBSCAN algorithm for an infinite number of distance parameters εi which are smaller than a “generating distance” ε (i.e. 0 ≤ εi ≤ ε).  The only difference is that we do not assign cluster memberships.  Instead, we store the order in which the objects are processed and the information which would be used by an extended DBSCAN algorithm to assign cluster memberships if this were at all possible for an infinite number of parameters). April 30,2012 9
  • 10. OPTICS  OPTICS does not explicitly produce a data set clustering.  It outputs a cluster ordering.  It is linear list of all objects under analysis and  represents the density-based clustering structure of the data.  Objects in a denser cluster are listed closer to each other in the cluster ordering.  Ordering is equivalent to density-based clustering obtained from a wide range of parameter settings.  Thus, OPTICS does not require the user to provide a specific density threshold.  The cluster ordering can be used to extract basic clustering information (e.g., cluster centers, or arbitrary-shaped clusters), derive the intrinsic clustering structure, as well as provide a visualization of the clustering April 30,2012 10
  • 11. OPTICS (CONTINUED..)  To construct the different clusterings simultaneously, the objects are processed in a specific order.  This order selects an object that is density-reachable with respect to the lowest (Eps) value so that clusters with higher density (lower (Eps)) will be finished first.  Based on this idea, OPTICS needs two important pieces of information per object:  Core Distance  Reachability Distance April 30,2012 11 It was presented by Mihael Ankerst, Markus M. Breunig,Hans-Peter Kriegel and Jörg Sander.
  • 12. TERMINOLOGIES  ε-Neighborhood  Objects within a radius of ε from an object. (epsilon-neighborhood)  Core objects  ε-Neighborhood of an object contains at least MinPts of objects April 30,2012 12 q p εε ε-Neighborhood of p ε-Neighborhood of q p is a core object (MinPts = 4) q is not a core object
  • 13. TERMINOLOGIES  Directly Density Reachable  An object q is directly density-reachable from object p if q is within the ε- Neighborhood of p and p is a core object April 30,2012 13 q p εε  q is directly density-reachable from p  p is not directly density- reachable from q?
  • 14. TERMINOLOGIES  Density Reachable  An object p is density-reachable from q w.r.t ε and MinPts if there is a chain of objects p1,…,pn, with p1=q, pn=p such that pi+1is directly density- reachable from pi w.r.t ε and MinPts for all 1 <= i <= n April 30,2012 14 p  q is density-reachable from p  p is not density- reachable from q>  Transitive closure of direct density-Reachability, asymmetric q
  • 15. TERMINOLOGIES  Definition: core-distance  Definition: reachability-distance       otherwise)(dist |),(rangeQuery|if )(distcore , oMinPts MinPtso oMinPts   reach dist ( , ) max(core dist ( ),dist( , )), ,   MinPts MinPtsp o o p o core-distance(o) o reachability-distance(p,o) p p reachability-distance(p,o)  MinPts = 5 April 30,2012 15
  • 16. ABOUT OPTICS COMPUTATION  It computes an ordering of all objects in a given database. And  It stores the core-distance and a suitable reachability-distance for each object in the database.  OPTICS maintains a list called OrderSeeds to generate the output ordering.  Objects in OrderSeeds  are sorted by the reachability-distance from their respective closest core objects,  that is, by the smallest reachability-distance of each object. April 30,2012 16
  • 17. ABOUT OPTICS ALGORITHM  Begin with an arbitrary object from the input database as the current object, p.  It retrieves the ε-neighborhood of p, determines the core-distance, and sets the reachability-distance to undefined.  The current object, p, is then written to output.  If p is not a core object,  OPTICS simply moves on to the next object in the OrderSeeds list (or the input database if OrderSeeds is empty). April 30,2012 17
  • 18. ABOUT OPTICS ALGORITHM  If p is a core object,  then for each object, q, in the ε-neighborhood of p,  OPTICS updates its reachability-distance from p  and inserts q into OrderSeeds if q has not yet been processed.  The iteration continues until the input is fully consumed and OrderSeeds is empty. April 30,2012 18
  • 19. ALGORITHM  OPTICS (SetOfObjects, e, MinPts, OrderedFile)  OrderedFile.open();  FOR i FROM 1 TO SetOfObjects.size DO  Object := SetOfObjects.get(i);  IF NOT Object.Processed THEN  ExpandClusterOrder(SetOfObjects, Object, e, MinPts, OrderedFile)  OrderedFile.close();  END; // OPTICS April 30,2012 19
  • 20. PROCEDURE FOR ExpandClusterOrder  ExpandClusterOrder(SetOfObjects, Object, ε, MinPts, OrderedFile);  neighbors := SetOfObjects.neighbors(Object, ε);  Object.Processed := TRUE;  Object.reachability_distance := UNDEFINED;  Object.setCoreDistance(neighbors, ε, MinPts);  OrderedFile.write(Object);  IF Object.core_distance <> UNDEFINED THEN  OrderSeeds.update(neighbors, Object);  WHILE NOT OrderSeeds.empty() DO  currentObject := OrderSeeds.next();  neighbors:=SetOfObjects.neighbors(currentObject, ε);  currentObject.Processed := TRUE;  currentObject.setCoreDistance(neighbors, ε, MinPts);  OrderedFile.write(currentObject);  IF currentObject.core_distance<>UNDEFINED THEN  OrderSeeds.update(neighbors, currentObject);  END; // ExpandClusterOrder April 30,2012 20 object is simply written to the file OrderedFile with its coredistance and its current reachability-distance.
  • 21. OrderSeeds::update()  OrderSeeds::update(neighbors, CenterObject);  c_dist := CenterObject.core_distance;  FORALL Object FROM neighbors DO  IF NOT Object.Processed THEN  new_r_dist:=max(c_dist,CenterObject.dist(Object));  IF Object.reachability_distance=UNDEFINED THEN  Object.reachability_distance := new_r_dist;  insert(Object, new_r_dist);  ELSE // Object already in OrderSeeds  IF new_r_dist<Object.reachability_distance THEN  Object.reachability_distance := new_r_dist;  decrease(Object, new_r_dist);  END; // OrderSeeds::update April 30,2012 21
  • 22.  Having generated the augmented cluster-ordering of a database with respect to e and MinPts,  extract any density-based clustering from this order with respect to MinPts and a clustering- distance ε ’ ≤ε  by simply “scanning” the cluster-ordering  and assigning cluster-memberships depending on the reachability- distance and the core- distance of the objects.  That an extraction is possible only demonstrates that the cluster-ordering of a data set actually contains the information about the intrinsic clustering structure of that data set (up to the generating distance ε) . April 30,2012 22
  • 23. ExtractDBSCAN-Clustering (ClusterOrderedObjs, ε’, MinPts)  ExtractDBSCAN-Clustering (ClusterOrderedObjs, ε’, MinPts)  // Precondition: ε ' ≤ generating dist ε for ClusterOrderedObjs  ClusterId := NOISE;  FOR i FROM 1 TO ClusterOrderedObjs.size DO  Object := ClusterOrderedObjs.get(i);  IF Object.reachability_distance > ε’ THEN  // UNDEFINED > ε  IF Object.core_distance ≤ ε’ THEN  ClusterId := nextId(ClusterId);  Object.clusterId := ClusterId;  ELSE  Object.clusterId := NOISE;  ELSE // Object.reachability_distance ≤ ε’  Object.clusterId := ClusterId;  END; // ExtractDBSCAN-Clustering April 30,2012 23
  • 24. OPTICS ALGORITHM EXAMPLE A I B J K L R M P N C F D E G H 44  reach seedlist: • Example Database (2-dimensional, 16 points) • ε= 44, MinPts = 3 April 30,2012 24
  • 25. OPTICS ALGORITHM EXAMPLE A I B J K L R M P N C F D E G H 44  reach seedlist: A I B J K L R M P N C F D E G H A 44   core- distance (B,40) (I, 40) • Example Database (2-dimensional, 16 points) • ε= 44, MinPts = 3 April 30,2012 25
  • 26. OPTICS ALGORITHM EXAMPLE 44  reach A 44  B A I B J K L R M P N C F D E G H seedlist: (I, 40) (C, 40) • Example Database (2-dimensional, 16 points) • ε= 44, MinPts = 3 April 30,2012 26
  • 27. OPTICS ALGORITHM EXAMPLE 44  reach A 44  B A I B J K L R M P N C F D E G H I seedlist: (J, 20) (K, 20) (L, 31) (C, 40) (M, 40) (R, 43) • Example Database (2-dimensional, 16 points) • ε= 44, MinPts = 3 April 30,2012 27
  • 28. OPTICS ALGORITHM EXAMPLE 44  reach A 44  B I A I B J K L R M P N C F D E G H J seedlist: (L, 19) (K, 20) (R, 21) (M, 30) (P, 31) (C, 40) • Example Database (2-dimensional, 16 points) • ε= 44, MinPts = 3 April 30,2012 28
  • 29. OPTICS ALGORITHM EXAMPLE 44  reach A 44  B I J A I B J K L R M P N C F D E G H L … seedlist: (M, 18) (K, 18) (R, 20) (P, 21) (N, 35) (C, 40) • Example Database (2-dimensional, 16 points) • ε= 44, MinPts = 3 April 30,2012 29
  • 30. OPTICS ALGORITHM EXAMPLE A I B J K L R M P N C F D E G H seedlist: - A B I J L M K N R P C D F G E H 44 reach  • Example Database (2-dimensional, 16 points) • ε= 44, MinPts = 3 April 30,2012 30
  • 31. OPTICS ALGORITHM EXAMPLE A I B J K L R M P N C F D E G H seedlist: - A B I J L M K N R P C D F G E H 44 reach  • Example Database (2-dimensional, 16 points) • ε= 44, MinPts = 3 April 30,2012 31
  • 32. GRAPHICAL REPRESENTATION  A data set’s cluster ordering can be represented graphically.  It helps to visualize and understand the clustering structure in a data set. April 30,2012 32
  • 33. GRAPHICAL REPRESENTATION  In Figure  reachability plot for a simple 2-D data set, which presents a general overview of how the data are structured and clustered.  The data objects are plotted in the clustering order (horizontal axis) together with their respective reachability-distances (vertical axis).  The three Gaussian “bumps” in the plot reflect three clusters in the data set. April 30,2012 33
  • 34. ALGORITHM PERFROMANCE  performed an extensive performance test using different data sets and different parameter settings.  simply turned out that the run-time of OPTICS was almost constantly 1.6 times the run-time of DBSCAN.  not surprising since the run-time for OPTICS as well as for DBSCAN is heavily dominated by  the run-time of the ε -neighborhood queries  which must be performed for each object in the database, i.e. the run-time for both algorithms is O(n * run-time of an e-neighborhood query). April 30,2012 34
  • 35. ALGORITHM PERFROMANCE  To retrieve the e-neighborhood of an object o, a region query with the center o and the radius e is used.  Without any index support, to answer such a region query, a scan through the whole database has to be performed.  In this case, the run-time of OPTICS would be O(n2).  If a tree-based spatial index can be used, the run-time is reduced to O (n log n) April 30,2012 35
  • 36. ALGORITHM PERFROMANCE  The height of such a tree-based index is O(log n) for a database of n objects in the worst case and, at least in low-dimensional spaces, a query with a “small” query region has to traverse only a limited number of paths.  Furthermore, if we have a direct access to the e-neighborhood, e.g. if the objects are organized in a grid, the run-time is further reduced to O(n) because in a grid the complexity of a single neighborhood query is O(1). April 30,2012 36
  • 37. CONCLUSION  OPTICS computes an augmented cluster- ordering of the database objects.  The main advantage of approach, when compared to the clustering algorithms proposed in the literature, is that, do not limit to one global parameter setting.  Instead, the augmented cluster-ordering contains information which is equivalent to the density based clusterings corresponding to a broad range of parameter settings and thus is a versatile basis for both automatic and interactive cluster analysis. April 30,2012 37
  • 38. REFERENCES  [1] Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jörg Sander, “OPTICS: Ordering Points To Identify the Clustering Structure” , Proc. ACM SIGMOD’99 Int. Conf. on Management of Data, Philadelphia PA, 1999.  [2] Data Mining Concepts and Techniques by Han Kamber Pei , Third Edition  [3] Stefan Brecheisen, Hans-Peter Kriegel, Martin Pfeifle, “Efficient Density- Based Clustering of Complex Objects“  [4] Class Lecture Slides about Density Clustering -DBSCAN April 30,2012 38
  • 39. THANK YOU FOR YOUR CO-OPERATION April 30,2012 39