Clustering
Density and Grid Based
1
2
Density based methods
 Clusters – dense regions of objects
 Low density regions – Noise
 DBSCAN
 Density Based Spatial Clustering of Applications with
Noise
 OPTICS
 Ordering Points To Identify the Clustering Structure
 DENCLUE
 DENsity Based CLUstEring
3
DBSCAN
 Cluster – maximal set of density connected points
 Grows regions with sufficiently high density into clusters
 ε-neighborhood
 MinPts and Core object
 Directly Density Reachable
 An object p is directly density reachable from object q if
p is within the ε-neighborhood of q and q is a core
object
p
q
MinPts = 5
4
DBSCAN
 Density Reachable
 An object p is density reachable from q, if there is
a chain of objects p1, …pn, p1=q and pn=p such that
pi+1 is directly density reachable from pi
p
q
p1
5
DBSCAN
 Density Connected
An object p is density connected to object
q if there is an object o such that both p and q are
density reachable from o.
p q
o
6
DBSCAN
 Arbitrarily select a point p
 Retrieve all points density-reachable from p
 If p is a core point, a cluster is formed.
 If p is a border point, no points are density-reachable
from p, then DBSCAN visits the next point of the
database.
 Continue the process until all of the points have been
processed.
 Complexity : O(n log n) / O(n2
)
7
OPTICS: A Cluster-Ordering Method
 OPTICS: Ordering Points To Identify the Clustering
Structure
 Produces a special order of the database with respect
to its density-based clustering structure
 Good for both automatic and interactive cluster
analysis, including finding intrinsic clustering structure
 Can be represented graphically or using visualization
techniques
OPTICS
 In DBSCAN, for a constant MinPts value, density based
clusters with respect to a higher density (lower value of
ε) are completely contained in lower density sets.
 DBSCAN is extended so that Objects are processed in a
specific order.
 Selects an object that is density-reachable with respect to lowest
ε value
 Core distance of an object p : smallest ε’ value that makes {p} a
core object
 Reachability distance of an object q with respect to p = max
(core-distance of p, d(p,q))
8
OPTICS
 Complexity : O(n log n)
9
10
ε
ε
Reachability-
distance
Cluster-order
of the objects
undefined
ε ‘
OPTICS
11
DENCLUE: using density functions
 DENsity-based CLUstEring
 Major features
 Solid mathematical foundation
 Good for data sets with large amounts of noise
 Allows a compact mathematical description of arbitrarily
shaped clusters in high-dimensional data sets
 Significantly faster than existing algorithm (faster than
DBSCAN by a factor of up to 45)
 But needs a large number of parameters
12
 Influence function: describes the impact of a data point within its
neighborhood.
 x, y – objects in Fd
– d-dimensional input space
 Influence of object y on x is:
 Can be determined by distance:
 Overall density of the data space can be calculated as the sum of
the influence function of all data points.
 Clusters can be determined mathematically by identifying density
attractors.
 Density attractors are local maximal of the overall density function.
DENCLUE
),()( yxfxf B
y
B =
otherwise1or),(0),( σ>= yxdifyxfsquare
f x y eGaussian
d x y
( , )
( , )
=
−
2
2
2σ
13
 Density attractor – Local maxima of overall density
function
 A point x is said to be density attracted to a density
attractor x* if there exists a set of points x0, x1,..xk
such that x0 = x and xk =x* and the gradient of xi-1 is
in the direction of xi
DENCLUE
DENCLUE
 Center defined clusters
 For a density attractor x* - a subset of points that are
density attracted by x* and where density function x* is
no less than threshold ξ
 Others are outliers
 Arbitrary shape cluster
 Set of density attractors and set of Cs
 There should be a path from each density attractor to
another where density function value for each point is
no less that ξ
14
DENCLUE
15
16
Grid Based Methods
 Uses a Multi-resolution grid data structure
 Quantizes space into a finite number of cells
that form a grid structure
 Fast processing time
 STING
 WaveCluster
 CLIQUE – CLustering In QUEst
17
STING
 STatistical Information Grid
 Spatial area is divided into rectangular cells
 Several levels of cells – at different levels of
resolution
 High level cell is partitioned into several lower
level cells
 Statistical attributes are stored in cell
 Mean, Maximum, Minimum
18
STING
19
STING
 Parameters of higher level cells are computed
from those at lower levels
 To answer queries
 Identify level
 Estimate cell’s relevance to query
 Process relevant cells at lower levels
 Continue to lowest level
20
STING
 Computation is query independent
 Parallel processing – supported
 Data is processed in a single pass
 Quality depends on granularity
21
WaveCluster
 A multi-resolution clustering approach which applies
wavelet transform to the feature space
 A wavelet transform is a signal processing technique
that decomposes a signal into different frequency sub-
band.
 Both grid-based and density-based
 Input parameters:
 # of grid cells for each dimension
 the wavelet, and the # of applications of wavelet
transform.
22
WaveCluster
 Using wavelet transform to find clusters
 Summarises the data by imposing a multidimensional
grid structure onto data space
 These multidimensional spatial data objects are
represented in a n-dimensional feature space
 Apply wavelet transform on feature space to find the
dense regions in the feature space
 Apply wavelet transform multiple times which result in
clusters at different scales from fine to coarse
23
Quantization
24
Transformation
25
WaveCluster
 Reasons for using Wavelet transformation in clustering
 Unsupervised clustering
It uses filters to emphasize region where points cluster, but
simultaneously to suppress weaker information in their boundary
 Effective removal of outliers
 Multi-resolution
 Cost efficiency
 Major features:
 Complexity O(N)
 Detect arbitrary shaped clusters at different scales
 Not sensitive to noise, not sensitive to input order
 Only applicable to low dimensional data
26
CLIQUE (Clustering In QUEst)
 Automatically identifying subspaces of a high dimensional data space
that allow better clustering than original space
 CLIQUE can be considered as both density-based and grid-based
 It partitions each dimension into the same number of equal length
interval
 It partitions an m-dimensional data space into non-overlapping
rectangular units
 A unit is dense if the fraction of total data points contained in the unit
exceeds the input model parameter
 A cluster is a maximal set of connected dense units within a
subspace

More Related Content

PPT
K mean-clustering algorithm
PPT
2.3 bayesian classification
PPT
3.2 partitioning methods
PPTX
Fuzzy Clustering(C-means, K-means)
PPT
Mining Frequent Patterns, Association and Correlations
PPTX
Density based methods
PPTX
Dbscan algorithom
PPT
3.3 hierarchical methods
K mean-clustering algorithm
2.3 bayesian classification
3.2 partitioning methods
Fuzzy Clustering(C-means, K-means)
Mining Frequent Patterns, Association and Correlations
Density based methods
Dbscan algorithom
3.3 hierarchical methods

What's hot (20)

PDF
Introduction to Recurrent Neural Network
PPTX
Data Mining: Mining ,associations, and correlations
PPTX
Architectural styles and patterns
DOCX
K means report
PPTX
Data mining primitives
PPT
5.2 mining time series data
PPTX
DBSCAN : A Clustering Algorithm
PPTX
Ensemble learning
PPT
2.2 decision tree
PPTX
Issues in knowledge representation
PPTX
Naive bayes
PPTX
Unification and Lifting
PPTX
Grid based method & model based clustering method
PPT
Clustering
PDF
Decision trees in Machine Learning
PPT
Np cooks theorem
PPT
3. mining frequent patterns
PPTX
Introduction to Deep Learning
PDF
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
PDF
Density Based Clustering
Introduction to Recurrent Neural Network
Data Mining: Mining ,associations, and correlations
Architectural styles and patterns
K means report
Data mining primitives
5.2 mining time series data
DBSCAN : A Clustering Algorithm
Ensemble learning
2.2 decision tree
Issues in knowledge representation
Naive bayes
Unification and Lifting
Grid based method & model based clustering method
Clustering
Decision trees in Machine Learning
Np cooks theorem
3. mining frequent patterns
Introduction to Deep Learning
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
Density Based Clustering
Ad

Viewers also liked (13)

PPTX
Clique
PPTX
HR FUNCTIONS
PPT
Cure, Clustering Algorithm
PPT
1.7 data reduction
PPTX
Overview of human resource management system & function
PPTX
Application of data mining
PPTX
Role of HR Manager
PPTX
hrm functions
PPT
Functions and Activities of HRM
PPTX
PDF
Data Mining: Association Rules Basics
PPT
Hr functions and strategy ppt
Clique
HR FUNCTIONS
Cure, Clustering Algorithm
1.7 data reduction
Overview of human resource management system & function
Application of data mining
Role of HR Manager
hrm functions
Functions and Activities of HRM
Data Mining: Association Rules Basics
Hr functions and strategy ppt
Ad

Similar to 3.4 density and grid methods (20)

PPT
dm_clustering2.ppt
PPTX
density based method and expectation maximization
PDF
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
PDF
7. 10083 12464-1-pb
PPTX
DB_ALGOS.pptx IT IS THE PPT CLUSTERING ;
PDF
Clustering Algorithms for Data Stream
PDF
A0360109
PDF
clustering density technidques in machine learning
PDF
50120140501016
PPTX
Graph and Density Based Clustering
PDF
DBSCAN
PPT
upd Unit-v -Cluster Analysis (1) (1).ppt
PPTX
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
PPT
Clustering_Unsupervised learning Unsupervised learning.ppt
PPTX
Dbscan
PDF
Analysis of mass based and density based clustering techniques on numerical d...
PPT
cluster analysis
PDF
Paper id 26201478
PPTX
UNIT - 4: Data Warehousing and Data Mining
PDF
Optics ordering points to identify the clustering structure
dm_clustering2.ppt
density based method and expectation maximization
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
7. 10083 12464-1-pb
DB_ALGOS.pptx IT IS THE PPT CLUSTERING ;
Clustering Algorithms for Data Stream
A0360109
clustering density technidques in machine learning
50120140501016
Graph and Density Based Clustering
DBSCAN
upd Unit-v -Cluster Analysis (1) (1).ppt
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
Clustering_Unsupervised learning Unsupervised learning.ppt
Dbscan
Analysis of mass based and density based clustering techniques on numerical d...
cluster analysis
Paper id 26201478
UNIT - 4: Data Warehousing and Data Mining
Optics ordering points to identify the clustering structure

More from Krish_ver2 (20)

PPT
5.5 back tracking
PPT
5.5 back track
PPT
5.5 back tracking 02
PPT
5.4 randomized datastructures
PPT
5.4 randomized datastructures
PPT
5.4 randamized algorithm
PPT
5.3 dynamic programming 03
PPT
5.3 dynamic programming
PPT
5.3 dyn algo-i
PPT
5.2 divede and conquer 03
PPT
5.2 divide and conquer
PPT
5.2 divede and conquer 03
PPT
5.1 greedyyy 02
PPT
5.1 greedy
PPT
5.1 greedy 03
PPT
4.4 hashing02
PPT
4.4 hashing
PPT
4.4 hashing ext
PPT
4.4 external hashing
PPT
4.2 bst
5.5 back tracking
5.5 back track
5.5 back tracking 02
5.4 randomized datastructures
5.4 randomized datastructures
5.4 randamized algorithm
5.3 dynamic programming 03
5.3 dynamic programming
5.3 dyn algo-i
5.2 divede and conquer 03
5.2 divide and conquer
5.2 divede and conquer 03
5.1 greedyyy 02
5.1 greedy
5.1 greedy 03
4.4 hashing02
4.4 hashing
4.4 hashing ext
4.4 external hashing
4.2 bst

Recently uploaded (20)

PDF
Literature_Review_methods_ BRACU_MKT426 course material
PDF
Farming Based Livelihood Systems English Notes
PPTX
CAPACITY BUILDING PROGRAMME IN ADOLESCENT EDUCATION
PDF
Journal of Dental Science - UDMY (2022).pdf
PDF
Controlled Drug Delivery System-NDDS UNIT-1 B.Pharm 7th sem
PPTX
Thinking Routines and Learning Engagements.pptx
PDF
Environmental Education MCQ BD2EE - Share Source.pdf
PDF
The TKT Course. Modules 1, 2, 3.for self study
PDF
M.Tech in Aerospace Engineering | BIT Mesra
PPTX
2025 High Blood Pressure Guideline Slide Set.pptx
PDF
Nurlina - Urban Planner Portfolio (english ver)
PPTX
Climate Change and Its Global Impact.pptx
PDF
Hospital Case Study .architecture design
PDF
PUBH1000 - Module 6: Global Health Tute Slides
PPTX
ACFE CERTIFICATION TRAINING ON LAW.pptx
PDF
0520_Scheme_of_Work_(for_examination_from_2021).pdf
PDF
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
PDF
faiz-khans about Radiotherapy Physics-02.pdf
PDF
Journal of Dental Science - UDMY (2020).pdf
PPTX
Integrated Management of Neonatal and Childhood Illnesses (IMNCI) – Unit IV |...
Literature_Review_methods_ BRACU_MKT426 course material
Farming Based Livelihood Systems English Notes
CAPACITY BUILDING PROGRAMME IN ADOLESCENT EDUCATION
Journal of Dental Science - UDMY (2022).pdf
Controlled Drug Delivery System-NDDS UNIT-1 B.Pharm 7th sem
Thinking Routines and Learning Engagements.pptx
Environmental Education MCQ BD2EE - Share Source.pdf
The TKT Course. Modules 1, 2, 3.for self study
M.Tech in Aerospace Engineering | BIT Mesra
2025 High Blood Pressure Guideline Slide Set.pptx
Nurlina - Urban Planner Portfolio (english ver)
Climate Change and Its Global Impact.pptx
Hospital Case Study .architecture design
PUBH1000 - Module 6: Global Health Tute Slides
ACFE CERTIFICATION TRAINING ON LAW.pptx
0520_Scheme_of_Work_(for_examination_from_2021).pdf
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
faiz-khans about Radiotherapy Physics-02.pdf
Journal of Dental Science - UDMY (2020).pdf
Integrated Management of Neonatal and Childhood Illnesses (IMNCI) – Unit IV |...

3.4 density and grid methods

  • 2. 2 Density based methods  Clusters – dense regions of objects  Low density regions – Noise  DBSCAN  Density Based Spatial Clustering of Applications with Noise  OPTICS  Ordering Points To Identify the Clustering Structure  DENCLUE  DENsity Based CLUstEring
  • 3. 3 DBSCAN  Cluster – maximal set of density connected points  Grows regions with sufficiently high density into clusters  ε-neighborhood  MinPts and Core object  Directly Density Reachable  An object p is directly density reachable from object q if p is within the ε-neighborhood of q and q is a core object p q MinPts = 5
  • 4. 4 DBSCAN  Density Reachable  An object p is density reachable from q, if there is a chain of objects p1, …pn, p1=q and pn=p such that pi+1 is directly density reachable from pi p q p1
  • 5. 5 DBSCAN  Density Connected An object p is density connected to object q if there is an object o such that both p and q are density reachable from o. p q o
  • 6. 6 DBSCAN  Arbitrarily select a point p  Retrieve all points density-reachable from p  If p is a core point, a cluster is formed.  If p is a border point, no points are density-reachable from p, then DBSCAN visits the next point of the database.  Continue the process until all of the points have been processed.  Complexity : O(n log n) / O(n2 )
  • 7. 7 OPTICS: A Cluster-Ordering Method  OPTICS: Ordering Points To Identify the Clustering Structure  Produces a special order of the database with respect to its density-based clustering structure  Good for both automatic and interactive cluster analysis, including finding intrinsic clustering structure  Can be represented graphically or using visualization techniques
  • 8. OPTICS  In DBSCAN, for a constant MinPts value, density based clusters with respect to a higher density (lower value of ε) are completely contained in lower density sets.  DBSCAN is extended so that Objects are processed in a specific order.  Selects an object that is density-reachable with respect to lowest ε value  Core distance of an object p : smallest ε’ value that makes {p} a core object  Reachability distance of an object q with respect to p = max (core-distance of p, d(p,q)) 8
  • 9. OPTICS  Complexity : O(n log n) 9
  • 11. 11 DENCLUE: using density functions  DENsity-based CLUstEring  Major features  Solid mathematical foundation  Good for data sets with large amounts of noise  Allows a compact mathematical description of arbitrarily shaped clusters in high-dimensional data sets  Significantly faster than existing algorithm (faster than DBSCAN by a factor of up to 45)  But needs a large number of parameters
  • 12. 12  Influence function: describes the impact of a data point within its neighborhood.  x, y – objects in Fd – d-dimensional input space  Influence of object y on x is:  Can be determined by distance:  Overall density of the data space can be calculated as the sum of the influence function of all data points.  Clusters can be determined mathematically by identifying density attractors.  Density attractors are local maximal of the overall density function. DENCLUE ),()( yxfxf B y B = otherwise1or),(0),( σ>= yxdifyxfsquare f x y eGaussian d x y ( , ) ( , ) = − 2 2 2σ
  • 13. 13  Density attractor – Local maxima of overall density function  A point x is said to be density attracted to a density attractor x* if there exists a set of points x0, x1,..xk such that x0 = x and xk =x* and the gradient of xi-1 is in the direction of xi DENCLUE
  • 14. DENCLUE  Center defined clusters  For a density attractor x* - a subset of points that are density attracted by x* and where density function x* is no less than threshold ξ  Others are outliers  Arbitrary shape cluster  Set of density attractors and set of Cs  There should be a path from each density attractor to another where density function value for each point is no less that ξ 14
  • 16. 16 Grid Based Methods  Uses a Multi-resolution grid data structure  Quantizes space into a finite number of cells that form a grid structure  Fast processing time  STING  WaveCluster  CLIQUE – CLustering In QUEst
  • 17. 17 STING  STatistical Information Grid  Spatial area is divided into rectangular cells  Several levels of cells – at different levels of resolution  High level cell is partitioned into several lower level cells  Statistical attributes are stored in cell  Mean, Maximum, Minimum
  • 19. 19 STING  Parameters of higher level cells are computed from those at lower levels  To answer queries  Identify level  Estimate cell’s relevance to query  Process relevant cells at lower levels  Continue to lowest level
  • 20. 20 STING  Computation is query independent  Parallel processing – supported  Data is processed in a single pass  Quality depends on granularity
  • 21. 21 WaveCluster  A multi-resolution clustering approach which applies wavelet transform to the feature space  A wavelet transform is a signal processing technique that decomposes a signal into different frequency sub- band.  Both grid-based and density-based  Input parameters:  # of grid cells for each dimension  the wavelet, and the # of applications of wavelet transform.
  • 22. 22 WaveCluster  Using wavelet transform to find clusters  Summarises the data by imposing a multidimensional grid structure onto data space  These multidimensional spatial data objects are represented in a n-dimensional feature space  Apply wavelet transform on feature space to find the dense regions in the feature space  Apply wavelet transform multiple times which result in clusters at different scales from fine to coarse
  • 25. 25 WaveCluster  Reasons for using Wavelet transformation in clustering  Unsupervised clustering It uses filters to emphasize region where points cluster, but simultaneously to suppress weaker information in their boundary  Effective removal of outliers  Multi-resolution  Cost efficiency  Major features:  Complexity O(N)  Detect arbitrary shaped clusters at different scales  Not sensitive to noise, not sensitive to input order  Only applicable to low dimensional data
  • 26. 26 CLIQUE (Clustering In QUEst)  Automatically identifying subspaces of a high dimensional data space that allow better clustering than original space  CLIQUE can be considered as both density-based and grid-based  It partitions each dimension into the same number of equal length interval  It partitions an m-dimensional data space into non-overlapping rectangular units  A unit is dense if the fraction of total data points contained in the unit exceeds the input model parameter  A cluster is a maximal set of connected dense units within a subspace