SlideShare a Scribd company logo
K-MEANS
CLUSTERING
INTRODUCTION-
What is clustering?
 Clustering is the classification of objects into
different groups, or more precisely, the
partitioning of a data set into subsets
(clusters), so that the data in each subset
(ideally) share some common trait - often
according to some defined distance measure.
K-MEANS CLUSTERING
 The k-means algorithm is an algorithm to cluster
n objects based on attributes into k partitions,
where k < n.
 An algorithm for partitioning (or clustering) N
data points into K disjoint subsets Sj (k clusters)
containing data points so as to minimize the
sum-of-squares criterion
where xi is a vector representing the the nth data
point and cj is the geometric centroid of the data
points in Sj (kth cluster)
K-MEANS CLUSTERING
 Simply speaking k-means clustering is an
algorithm to classify or to group the objects
based on attributes/features into K number of
group.
 K is positive integer number.
 The grouping is done by minimizing the sum
of squares of distances between data and the
corresponding cluster centroid.
K-MEANS CLUSTERING
How the K-Mean Clustering
algorithm works?
 Initialization: once the number of groups, k has
been chosen, k centroids are established in the
data space, for instance, choosing them
randomly.
 Assignment of objects to the centroids: each
object of the data is assigned to its nearest
centroid.
 Centroids update: The position of the centroid
of each group is updated taking as the new
centroid the average position of the objects
belonging to said group.
 Step 1: Begin with a decision on the value of k =
number of clusters.
 Step 2: Put any initial partition that classifies the
data into k clusters. You may assign the training
samples randomly, or systematically as the following:
1. Take the first k training sample as single-element clusters
2. Assign each of the remaining (N-k) training sample to the
cluster with the nearest centroid. After each assignment,
recompute the centroid of the clusters.
K-MEANS CLUSTERING
 Step 3: Take each sample in sequence and
compute its distance from the centroid of
each of the clusters. If a sample is not
currently in the cluster with the closest
centroid, switch this sample to that cluster
and update the centroid of the cluster
gaining the new sample and the cluster
losing the sample.
 Step 4 . Repeat step 3 until convergence is
achieved, that is until a pass through the
training sample causes no new assignments.
K-MEANS CLUSTERING
A Simple example showing the
implementation of k-means algorithm
(using K=2)
Step 1:
Initialization: Randomly we choose following two centroids
(k=2) for two clusters.
In this case the 2 centroid are: m1 = (1.0,1.0) and m2 = (5.0,7.0)
Step 2:
 Thus, we obtain two clusters
containing:
{1,2,3} and {4,5,6,7}.
 Their new centroids are:
Individual Centroid 1 Centroid 2
1 0 7.21
2 1.12 6.10
3 3.61 3.61
4 7.21 0
5 4.72 2.5
6 5.31 2.06
7 4.30 2.92
Distance from individual
points to the two centroids
Step 3:
 Now using these centroids
we compute the Euclidean
distance of each object, as
shown in table.
 Therefore, the new
clusters are:
{1,2} and {3,4,5,6,7}
 Next centroids are:
m1=(1.25,1.5) and m2 =
(3.9,5.1)
Individual Centroid 1 Centroid 2
1 1.57 5.38
2 0.47 4.28
3 2.04 1.78
4 5.64 1.84
5 3.15 0.73
6 3.78 0.54
7 2.74 1.08
Distance from individual
points to the two centroids
Step 4 :
The clusters obtained are:
{1,2} and {3,4,5,6,7}
 Therefore, there is no
change in the cluster.
 Thus, the algorithm comes
to a halt here and final
result consist of 2 clusters
{1,2} and {3,4,5,6,7}.
Individual Centroid 1 Centroid 2
1 0.56 5.02
2 0.56 3.92
3 3.05 1.42
4 6.66 2.20
5 4.16 0.41
6 4.78 0.61
7 3.75 0.72
PLOT
(with K=3)
Step 1 Step 2
PLOT
Elbow Method (choosing the
number of clusters)
Another Method - silhouette coefficient (self study)
Elbow Method
Weaknesses of K-Mean Clustering
1. When the numbers of data are not so many, initial
grouping will determine the cluster significantly.
2. The number of cluster, K, must be determined before
hand. Its disadvantage is that it does not yield the same
result with each run, since the resulting clusters depend
on the initial random assignments.
3. We never know the real cluster, using the same data,
because if it is inputted in a different order it may
produce different cluster if the number of data is few.
4. It is sensitive to initial condition. Different initial condition
may produce different result of cluster. The algorithm
may be trapped in the local optimum.
Applications of K-Mean
Clustering
 It is relatively efficient and fast.
 k-means clustering can be applied to machine
learning or data mining
 Used on acoustic data in speech understanding to
convert waveforms into one of k categories or
Image Segmentation.
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Visualization: Example
Application: Segmentation
Lecture_3_k-mean-clustering.ppt
Segmentation

More Related Content

What's hot (20)

PDF
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Mohammed Bennamoun
 
PPTX
Fuzzy c means manual work
Dr.E.N.Sathishkumar
 
PPTX
Data Mining: clustering and analysis
DataminingTools Inc
 
PDF
Convolutional Neural Networks : Popular Architectures
ananth
 
PDF
K means clustering
Kuppusamy P
 
PDF
Performance Metrics for Machine Learning Algorithms
Kush Kulshrestha
 
PPTX
Vgg
heedaeKwon
 
PPTX
Introduction to Clustering algorithm
hadifar
 
PPTX
ML_ Unit_1_PART_A
Srimatre K
 
PPTX
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
Simplilearn
 
PPTX
K-means clustering algorithm
Vinit Dantkale
 
PPT
Chapter 11. Cluster Analysis Advanced Methods.ppt
Subrata Kumer Paul
 
PPTX
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
Simplilearn
 
PDF
Machine Learning Clustering
Rupak Roy
 
PDF
K-Means Algorithm
Carlos Castillo (ChaTo)
 
PDF
Recurrent neural networks rnn
Kuppusamy P
 
PDF
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Sebastian Raschka
 
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
PDF
Nonlinear dimension reduction
Yan Xu
 
PPTX
K means clustering | K Means ++
sabbirantor
 
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Mohammed Bennamoun
 
Fuzzy c means manual work
Dr.E.N.Sathishkumar
 
Data Mining: clustering and analysis
DataminingTools Inc
 
Convolutional Neural Networks : Popular Architectures
ananth
 
K means clustering
Kuppusamy P
 
Performance Metrics for Machine Learning Algorithms
Kush Kulshrestha
 
Introduction to Clustering algorithm
hadifar
 
ML_ Unit_1_PART_A
Srimatre K
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
Simplilearn
 
K-means clustering algorithm
Vinit Dantkale
 
Chapter 11. Cluster Analysis Advanced Methods.ppt
Subrata Kumer Paul
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
Simplilearn
 
Machine Learning Clustering
Rupak Roy
 
K-Means Algorithm
Carlos Castillo (ChaTo)
 
Recurrent neural networks rnn
Kuppusamy P
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Sebastian Raschka
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
Nonlinear dimension reduction
Yan Xu
 
K means clustering | K Means ++
sabbirantor
 

Similar to Lecture_3_k-mean-clustering.ppt (20)

PPT
k-mean-clustering big data analaysis.ppt
abikishor767
 
PPT
K mean clustering algorithm unsupervised learning
namansingh302004
 
PPT
k-mean-clustering for data classification
KantilalRane1
 
PPT
06K-means-clustering K-MEANS CLUSTERING.ppt
ALiHassan443161
 
PPT
06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt
ALiHassan443161
 
PPTX
AI-Lec20 Clustering I - Kmean.pptx
Syed Ejaz
 
PPTX
K means Clustering - algorithm to cluster n objects
VoidVampire
 
PPT
K mean-clustering algorithm
parry prabhu
 
PPT
K mean-clustering
PVP College
 
PPT
k-mean-Clustering impact on AI using DSS
MarkNaguibElAbd
 
PPT
Unsupervised Machine Learning, Clustering, K-Means
MomonLuffy
 
PPT
k-mean-clustering algorithm with example.ppt
geethar79
 
PPT
k-mean-clustering.ppt
RanimeLoutar
 
PPT
k-mean-clustering (1) clustering topic explanation
my123lapto
 
PDF
k-mean-clustering.pdf
YatharthKhichar1
 
PPTX
Clustering
Md. Hasnat Shoheb
 
PPT
Enhance The K Means Algorithm On Spatial Dataset
AlaaZ
 
PPTX
k-mean medoid and-knn-algorithm problems.pptx
DulalChandraDas1
 
PPTX
K means ALGORITHM IN MACHINE LEARNING.pptx
angelinjeba6
 
PPTX
partitioning methods in data mining .pptx
BodhanLaxman1
 
k-mean-clustering big data analaysis.ppt
abikishor767
 
K mean clustering algorithm unsupervised learning
namansingh302004
 
k-mean-clustering for data classification
KantilalRane1
 
06K-means-clustering K-MEANS CLUSTERING.ppt
ALiHassan443161
 
06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt
ALiHassan443161
 
AI-Lec20 Clustering I - Kmean.pptx
Syed Ejaz
 
K means Clustering - algorithm to cluster n objects
VoidVampire
 
K mean-clustering algorithm
parry prabhu
 
K mean-clustering
PVP College
 
k-mean-Clustering impact on AI using DSS
MarkNaguibElAbd
 
Unsupervised Machine Learning, Clustering, K-Means
MomonLuffy
 
k-mean-clustering algorithm with example.ppt
geethar79
 
k-mean-clustering.ppt
RanimeLoutar
 
k-mean-clustering (1) clustering topic explanation
my123lapto
 
k-mean-clustering.pdf
YatharthKhichar1
 
Clustering
Md. Hasnat Shoheb
 
Enhance The K Means Algorithm On Spatial Dataset
AlaaZ
 
k-mean medoid and-knn-algorithm problems.pptx
DulalChandraDas1
 
K means ALGORITHM IN MACHINE LEARNING.pptx
angelinjeba6
 
partitioning methods in data mining .pptx
BodhanLaxman1
 
Ad

Recently uploaded (20)

PPTX
fatigue in aircraft structures-221113192308-0ad6dc8c.pptx
aviatecofficial
 
PDF
Electrical Engineer operation Supervisor
ssaruntatapower143
 
PPTX
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
PPTX
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
PDF
Viol_Alessandro_Presentazione_prelaurea.pdf
dsecqyvhbowrzxshhf
 
PPTX
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PPTX
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
PPTX
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
PDF
Zilliz Cloud Demo for performance and scale
Zilliz
 
PDF
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
PPTX
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
PDF
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
PPTX
2025 CGI Congres - Surviving agile v05.pptx
Derk-Jan de Grood
 
PDF
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
PPTX
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PDF
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
DOCX
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
PPTX
Introduction to Basic Renewable Energy.pptx
examcoordinatormesu
 
fatigue in aircraft structures-221113192308-0ad6dc8c.pptx
aviatecofficial
 
Electrical Engineer operation Supervisor
ssaruntatapower143
 
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
Viol_Alessandro_Presentazione_prelaurea.pdf
dsecqyvhbowrzxshhf
 
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
Zilliz Cloud Demo for performance and scale
Zilliz
 
MAD Unit - 1 Introduction of Android IT Department
JappanMavani
 
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
2025 CGI Congres - Surviving agile v05.pptx
Derk-Jan de Grood
 
Pressure Measurement training for engineers and Technicians
AIESOLUTIONS
 
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
Introduction to Basic Renewable Energy.pptx
examcoordinatormesu
 
Ad

Lecture_3_k-mean-clustering.ppt

  • 2. INTRODUCTION- What is clustering?  Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often according to some defined distance measure.
  • 3. K-MEANS CLUSTERING  The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, where k < n.
  • 4.  An algorithm for partitioning (or clustering) N data points into K disjoint subsets Sj (k clusters) containing data points so as to minimize the sum-of-squares criterion where xi is a vector representing the the nth data point and cj is the geometric centroid of the data points in Sj (kth cluster) K-MEANS CLUSTERING
  • 5.  Simply speaking k-means clustering is an algorithm to classify or to group the objects based on attributes/features into K number of group.  K is positive integer number.  The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. K-MEANS CLUSTERING
  • 6. How the K-Mean Clustering algorithm works?  Initialization: once the number of groups, k has been chosen, k centroids are established in the data space, for instance, choosing them randomly.  Assignment of objects to the centroids: each object of the data is assigned to its nearest centroid.  Centroids update: The position of the centroid of each group is updated taking as the new centroid the average position of the objects belonging to said group.
  • 7.  Step 1: Begin with a decision on the value of k = number of clusters.  Step 2: Put any initial partition that classifies the data into k clusters. You may assign the training samples randomly, or systematically as the following: 1. Take the first k training sample as single-element clusters 2. Assign each of the remaining (N-k) training sample to the cluster with the nearest centroid. After each assignment, recompute the centroid of the clusters. K-MEANS CLUSTERING
  • 8.  Step 3: Take each sample in sequence and compute its distance from the centroid of each of the clusters. If a sample is not currently in the cluster with the closest centroid, switch this sample to that cluster and update the centroid of the cluster gaining the new sample and the cluster losing the sample.  Step 4 . Repeat step 3 until convergence is achieved, that is until a pass through the training sample causes no new assignments. K-MEANS CLUSTERING
  • 9. A Simple example showing the implementation of k-means algorithm (using K=2)
  • 10. Step 1: Initialization: Randomly we choose following two centroids (k=2) for two clusters. In this case the 2 centroid are: m1 = (1.0,1.0) and m2 = (5.0,7.0)
  • 11. Step 2:  Thus, we obtain two clusters containing: {1,2,3} and {4,5,6,7}.  Their new centroids are: Individual Centroid 1 Centroid 2 1 0 7.21 2 1.12 6.10 3 3.61 3.61 4 7.21 0 5 4.72 2.5 6 5.31 2.06 7 4.30 2.92 Distance from individual points to the two centroids
  • 12. Step 3:  Now using these centroids we compute the Euclidean distance of each object, as shown in table.  Therefore, the new clusters are: {1,2} and {3,4,5,6,7}  Next centroids are: m1=(1.25,1.5) and m2 = (3.9,5.1) Individual Centroid 1 Centroid 2 1 1.57 5.38 2 0.47 4.28 3 2.04 1.78 4 5.64 1.84 5 3.15 0.73 6 3.78 0.54 7 2.74 1.08 Distance from individual points to the two centroids
  • 13. Step 4 : The clusters obtained are: {1,2} and {3,4,5,6,7}  Therefore, there is no change in the cluster.  Thus, the algorithm comes to a halt here and final result consist of 2 clusters {1,2} and {3,4,5,6,7}. Individual Centroid 1 Centroid 2 1 0.56 5.02 2 0.56 3.92 3 3.05 1.42 4 6.66 2.20 5 4.16 0.41 6 4.78 0.61 7 3.75 0.72
  • 14. PLOT
  • 16. PLOT
  • 17. Elbow Method (choosing the number of clusters) Another Method - silhouette coefficient (self study) Elbow Method
  • 18. Weaknesses of K-Mean Clustering 1. When the numbers of data are not so many, initial grouping will determine the cluster significantly. 2. The number of cluster, K, must be determined before hand. Its disadvantage is that it does not yield the same result with each run, since the resulting clusters depend on the initial random assignments. 3. We never know the real cluster, using the same data, because if it is inputted in a different order it may produce different cluster if the number of data is few. 4. It is sensitive to initial condition. Different initial condition may produce different result of cluster. The algorithm may be trapped in the local optimum.
  • 19. Applications of K-Mean Clustering  It is relatively efficient and fast.  k-means clustering can be applied to machine learning or data mining  Used on acoustic data in speech understanding to convert waveforms into one of k categories or Image Segmentation.