SlideShare a Scribd company logo
K-Means Clustering
What is K-Means Clustering?
What is K-Means Clustering?
What is K-Means Clustering?
k-means performs division of objects
into clusters which are “similar”
between them and are “dissimilar” to
the objects belonging to another cluster
What is K-Means Clustering?
What is K-Means Clustering?
Can you explain this with an example?
What is K-Means Clustering?
Sure. For understanding K-Means in a
better way, let’s take an example of
Cricket
Can you explain this with an example?
What is K-Means Clustering?
Task: Identify bowlers and batsmen
What is K-Means Clustering?
Task: Identify bowlers and batsmen
 The data contains runs and wickets gained in the last 10 matches
 So, the bowler will have more wickets and the batsmen will have higher runs
Scores
What is K-Means Clustering?
Assign data points
Here, we have our dataset
with x and y coordinates
Now, we want to cluster this
data using K-Means
Runs
Wickets
What is K-Means Clustering?
Lorem ipsum
Cluster 1Assign data points
Lorem ipsum
We can see that this cluster
has players with high runs and
low wickets
Here, we have our dataset
with x and y coordinates
Now, we want to cluster this
data using K-Means
Runs
Wickets
Runs
Wickets
What is K-Means Clustering?
And here, we can see that this
cluster has players with high
wickets and low wickets
Lorem ipsum
Cluster 1 Cluster 2Assign data points
Lorem ipsumLorem ipsum
We can see that this cluster
has players with high runs and
low wickets
Here, we have our dataset
with x and y coordinates
Now, we want to cluster this
data using K-Means
Runs
Wickets
Runs
Wickets
Runs
Wickets
What is K-Means Clustering?
Consider the same data set of cricket
Solve the problem using K-Means
What is K-Means Clustering?
Initially, two centroids are assigned randomly
Euclidean distance to find out which centroid is closest to each data point and the data points are
assigned to the corresponding centroids
What is K-Means Clustering?
Reposition the two centroids for optimization.
What is K-Means Clustering?
The process is iteratively repeated until our centroids become static
What’s in it for you?
Types of Clustering
What is K-Means Clustering?
Applications of K-Means clustering
Common distance measure
How does K-Means clustering work?
K-Means Clustering Algorithm
Demo: K-Means Clustering
Use Case: Color Compression
Types of Clustering
Types of Clustering
Clustering
Hierarchical
Clustering
Agglomerative Divisive
Partitional
Clustering
K-Means Fuzzy C-Means
Types of Clustering
Clustering
Hierarchical
Clustering
Division
Clusters have a tree like structure or a parent
child relationship
Types of Clustering
Clustering
Hierarchical
Clustering
Agglomerative Divisive
a b c fd e
debc
def
bcdef
abcdef
“Bottom up" approach: Begin with
each element as a separate cluster
and merge them into successively
larger clusters
Types of Clustering
“Top down“ approach begin with the
whole set and proceed to divide it into
successively smaller clusters.
a b c fd e
de
def
bcdef
abcdef
bc
Clustering
Hierarchical
Clustering
Agglomerative Divisive
Types of Clustering
Clustering
Partitional Clustering
K-Means Fuzzy C-Means
c1
c2
Division of objects into clusters such
that each object is in exactly one
cluster, not several
Types of Clustering
Clustering
Partitional Clustering
K-Means Fuzzy C-Means
Division of objects into clusters such
that each object can belong to
multiple clusters
c2c1
Applications of K-Means Clustering
Applications of K-Means Clustering
Academic
Performance
Wireless Sensor
Network's
Diagnostic
Systems
Search Engines
Distance Measure
Distance Measure
Euclidean
distance
measure
Manhattan
distance
measure
Squared Euclidean
distance measure
Cosine distance
measure
Distance measure will determine the similarity between two elements and it will influence the shape of
the clusters
Euclidean Distance Measure
• The Euclidean distance is the "ordinary" straight line
• It is the distance between two points in Euclidean space
d=√ 𝑖=1
𝑛
( 𝑞𝑖− )2
p
q
Euclidian
Distance
𝑝𝑖
Option 02
Euclidean distance
measure
01
Squared euclidean
distance measure
02
Manhattan distance
measure
03
Cosine distance
measure
04
Squared Euclidean Distance Measure
The Euclidean squared distance metric uses the same equation as the
Euclidean distance metric, but does not take the square root.
d= 𝑖=1
𝑛
( 𝑞𝑖− )2
𝑝𝑖
Option 02
Euclidean distance
measure
01
Squared euclidean
distance measure
02
Manhattan distance
measure
03
Cosine distance
measure
04
Manhattan Distance Measure
Option 02
Euclidean distance
measure
01
Squared euclidean
distance measure
02
Manhattan distance
measure
03
Cosine distance
measure
04
The Manhattan distance is the simple sum of the horizontal and vertical
components or the distance between two points measured along axes at right angles
d= 𝑖=1
𝑛
| 𝑞 𝑥− |
p
q
Manhattan
Distance
𝑝 𝑥 +|𝑞 𝑥− |𝑝 𝑦
(x,y)
(x,y)
Cosine Distance Measure
Option 02
Euclidean distance
measure
01
Squared euclidean
distance measure
02
Manhattan distance
measure
03
Cosine distance
measure
04
The cosine distance similarity measures the angle between the two vectors
p
q
Cosine
Distance
𝑖=0
𝑛−1
𝑞𝑖−
𝑖=0
𝑛−1
(𝑞𝑖)2
× 𝑖=0
𝑛−1
(𝑝𝑖)2
d=
𝑝 𝑥
How does K-Means clustering work?
How does K-Means clustering work?
Start
Elbow point (k)
Reposition the
centroids
Grouping based on
minimum distance
Measure the distance
Convergence
- +
If clusters are
stable
If clusters are
unstable
How does K-Means clustering work?
Elbow point
Reposition the
centroids
Grouping
Measure the
distance
Convergence
• Let’s say, you have a dataset for a Grocery shop
• Now, the important question is, “how would you choose the optimum
number of clusters?“
?
c
1
How does K-Means clustering work?
Elbow point
Reposition the
centroids
Grouping
Measure the
distance
Convergence
• The best way to do this is by Elbow method
• The idea of the elbow method is to run K-Means clustering on the
dataset where ‘k’ is referred as number of clusters
• Within sum of squares (WSS) is defined as the sum of the squared distance
between each member of the cluster and its centroid
𝑖=1
𝑚
)𝑥𝑖
2
WSS = (
Where x𝑖 = data point and c𝑖 = closest point to centroid
− 𝑐𝑖
How does K-Means clustering work?
Elbow point
Reposition the
centroids
Grouping
Measure the
distance
Convergence
• Now, we draw a curve between WSS (within sum of squares) and the
number of clusters
• Here, we can see a very slow change in the value of WSS after k=2, so you should
take that elbow point value as the final number of clusters
Elbow pointWSS
No . of. clusters
k=2
How does K-Means clustering work?
Elbow point
Reposition the
centroids
Grouping
Measure the
distance
Convergence
Step 1: The given data points below are assumed as delivery points
c1
How does K-Means clustering work?
Elbow point
Reposition the
centroids
Grouping
Measure the
distance
Convergence
Step 2: We can randomly initialize two points called the cluster centroids,
Euclidean distance is a distance measure used to find out which data point
is closest to our centroids
c1
c1
c2c
1
c2
How does K-Means clustering work?
Elbow point
Reposition the
centroids
Grouping
Measure the
distance
Convergence
Step 3: Based upon the distance from c1 and c2 centroids, the data points will
group itself into clusters
c1
c1
c2c
1
c2
How does K-Means clustering work?
Elbow point
Reposition the
centroids
Grouping
Measure the
distance
Convergence
Step 4: Compute the centroid of data points inside blue cluster
Step 5: Reposition the centroid of the blue cluster to the new centroid
c1
c1
c
1
c2
How does K-Means clustering work?
Elbow point
Reposition the
centroids
Grouping
Measure the
distance
Convergence
Step 6: Now, compute the centroid of data points inside orange cluster
Step 7: Reposition the centroid of the orange cluster to the new centroid
c1
c1
c2
c
1
c2
How does K-Means clustering work?
Elbow point
Reposition the
centroids
Grouping
Measure the
distance
Convergence
Step 8: Once the clusters become static, K-Means clustering algorithm is
said to be converged
c
1
c2
K-Means Clustering Algorithm
K-Means Clustering Algorithm
Assuming we have inputs x1,x2,x3,…, and value of K,
Step 1 : Pick K random points as cluster centers called centroids
Step 2 : Assign each xi to nearest cluster by calculating its distance to each centroid
Step 3 : Find new cluster center by taking the average of the assigned points
Step 4 : Repeat Step 2 and 3 until none of the cluster assignments change
K-Means Clustering Algorithm
Step 1 :
We randomly pick K cluster centers (centroids). Let’s assume these are c1,c2,…,ckc1,c2,…,ck, and we
can say that;
C is the set of all centroids.
Step 2:
In this step, we assign each data point to closest center, this is done by calculating Euclidean
distance
arg min dist ( ,x )2
Where dist() is the Euclidean distance.
𝑐𝑖
∈C𝑐𝑖
𝑐1 𝑐2 𝑐 𝑘C= , ,.…
|𝑆𝑖|
= 1 ∑
Step 3:
In this step, we find the new centroid by taking the average of all the points assigned to that
cluster.
is the set of all points assigned to the i th cluster
Step 4:
In this step, we repeat step 2 and 3 until none of the cluster assignments change
That means until our clusters remain stable, we repeat the algorithm
xi∈Si
𝑐𝑖 𝑥𝑖
𝑠𝑖
K-Means Clustering Algorithm
Demo: K-Means Clustering
Demo: K-Means Clustering
Problem Statement
• Walmart wants to open a chain of stores across Florida and wants to find out optimal store locations
to maximize revenue
Solution
• Walmart already has a strong e-commerce presence
• Walmart can use its online customer data to analyze the customer locations along with the monthly
sales
Demo: K-Means Clustering
%matplotlib inline
import matplotlib.pyplot as plt
# for plot styling
import seaborn as sns; sns.set()
import numpy as np
from sklearn.datasets.samples_generator import make_blobs
X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)
plt.scatter(X[:, 0], X[:, 1], s=50);
Demo: K-Means Clustering
# output
Demo: K-Means Clustering
# assign four clusters
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
y_kmeans = kmeans.predict(X)
# import library
from sklearn.metrics import pairwise_distances_argmin
def find_clusters(X, n_clusters, rseed=2):
# 1. randomly choose clusters
rng = np.random.RandomState(rseed)
i = rng.permutation(X.shape[0])[:n_clusters]
centers = X[i]
while True:
Demo: K-Means Clustering
# 2. assign labels based on closest center
labels = pairwise_distances_argmin(X, centers)
# 3. find new centers from means of points
new_centers = np.array([X[labels == i].mean(0)
for i in range(n_clusters)])
centers, labels = find_clusters(X, 4)
plt.scatter(X[:, 0], X[:, 1], c=labels,
s=50, cmap='viridis’)
Demo: K-Means Clustering
# 4. check for convergence
if np.all(centers == new_centers):
break
centers = new_centers
return centers, labels
centers, labels = find_clusters(X, 4)
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis')
plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5);
Demo: K-Means Clustering
# output:
Conclusion
Congratulations!
We have demonstrated K-Means
clustering by establishing Walmart stores
across Florida in the most optimized way
Use case – Color compression
Use Case: K-Means for Color Compression
Problem Statement
To perform color compression on images using K-Means algorithm
Use Case: K-Means for Color Compression
# example 1:
# note: this requires the ``pillow`` package to be installed
from sklearn.datasets import load_sample_image
china = load_sample_image("flower.jpg")
ax = plt.axes(xticks=[], yticks=[])
ax.imshow(china);
#Output:
Use Case: K-Means for Color Compression
# returns the dimensions of the array
china.shape
# reshape the data to [n_samples x n_features], and rescale the colors so that they lie between 0 and 1
data = china / 255.0 # use 0...1 scale
data = data.reshape(427 * 640, 3)
data.shape
# visualize these pixels in this color space, using a subset of 10,000 pixels for efficiency
def plot_pixels(data, title, colors=None, N=10000):
if colors is None:
colors = data
Use Case: K-Means for Color Compression
# choose a random subset
rng = np.random.RandomState(0)
i = rng.permutation(data.shape[0])[:N]
colors = colors[i]
R, G, B = data[i].T
fig, ax = plt.subplots(1, 2, figsize=(16, 6))
ax[0].scatter(R, G, color=colors, marker='.')
ax[0].set(xlabel='Red', ylabel='Green', xlim=(0, 1), ylim=(0, 1))
ax[1].scatter(R, B, color=colors, marker='.')
ax[1].set(xlabel='Red', ylabel='Blue', xlim=(0, 1), ylim=(0, 1))
fig.suptitle(title, size=20);
Use Case: K-Means for Color Compression
plot_pixels(data, title='Input color space: 16 million possible colors')
Use Case: K-Means for Color Compression
# fix numPy issues
import warnings; warnings.simplefilter('ignore’)
# reducing these 16 million colors to just 16 colors
from sklearn.cluster import MiniBatchKMeans
kmeans = MiniBatchKMeans(16)
kmeans.fit(data)
new_colors = kmeans.cluster_centers_[kmeans.predict(data)]
plot_pixels(data, colors=new_colors,
title="Reduced color space: 16 colors")
Use Case: K-Means for Color Compression
china_recolored = new_colors.reshape(china.shape)
fig, ax = plt.subplots(1, 2, figsize=(16, 6), subplot_kw=dict(xticks=[], yticks=[]))
fig.subplots_adjust(wspace=0.05)
ax[0].imshow(china)
ax[0].set_title('Original Image', size=16)
ax[1].imshow(china_recolored)
ax[1].set_title('16-color Image', size=16);
# the result is re-coloring of the original pixels, where each pixel is assigned the color of its closest cluster center
# output:
Use Case: K-Means for Color Compression
# output
Use Case: K-Means for Color Compression
# example 2:
from sklearn.datasets import load_sample_image
china = load_sample_image(“china.jpg")
ax = plt.axes(xticks=[], yticks=[])
ax.imshow(china);
Use Case: K-Means for Color Compression
# returns the dimensions of the array
china.shape
# reshape the data to [n_samples x n_features], and rescale the colors so that they lie between 0 and 1
data = china / 255.0 # use 0...1 scale
data = data.reshape(427 * 640, 3)
data.shape
# visualize these pixels in this color space, using a subset of 10,000 pixels for efficiency
def plot_pixels(data, title, colors=None, N=10000):
if colors is None:
colors = data
Use Case: K-Means for Color Compression
# choose a random subset
rng = np.random.RandomState(0)
i = rng.permutation(data.shape[0])[:N]
colors = colors[i]
R, G, B = data[i].T
fig, ax = plt.subplots(1, 2, figsize=(16, 6))
ax[0].scatter(R, G, color=colors, marker='.')
ax[0].set(xlabel='Red', ylabel='Green', xlim=(0, 1), ylim=(0, 1))
ax[1].scatter(R, B, color=colors, marker='.')
ax[1].set(xlabel='Red', ylabel='Blue', xlim=(0, 1), ylim=(0, 1))
fig.suptitle(title, size=20);
Use Case: K-Means for Color Compression
plot_pixels(data, title='Input color space: 16 million possible colors')
Use Case: K-Means for Color Compression
# fix NumPy issues
import warnings; warnings.simplefilter('ignore’)
# reducing these 16 million colors to just 16 colors
from sklearn.cluster import MiniBatchKMeans
kmeans = MiniBatchKMeans(16)
kmeans.fit(data)
new_colors = kmeans.cluster_centers_[kmeans.predict(data)]
plot_pixels(data, colors=new_colors,
title="Reduced color space: 16 colors")
Use Case: K-Means for Color Compression
china_recolored = new_colors.reshape(china.shape)
fig, ax = plt.subplots(1, 2, figsize=(16, 6), subplot_kw=dict(xticks=[], yticks=[]))
fig.subplots_adjust(wspace=0.05)
ax[0].imshow(china)
ax[0].set_title('Original Image', size=16)
ax[1].imshow(china_recolored)
ax[1].set_title('16-color Image', size=16);
# the result is a re-coloring of the original pixels, where each pixel is assigned the color of its closest cluster center
# output
Use Case: K-Means for Color Compression
# output
Conclusion
Congratulations!
We have demonstrated
K-Means in color compression.
The hands on example will help
you to encounter any K-Means
project in future.
Key Takeaways
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning Algorithms |Simplilearn

More Related Content

What's hot (20)

PPT
K means Clustering Algorithm
Kasun Ranga Wijeweera
 
PDF
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
PPTX
Support Vector Machine ppt presentation
AyanaRukasar
 
PDF
Logistic regression in Machine Learning
Kuppusamy P
 
PDF
Naive Bayes
CloudxLab
 
PPTX
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 
PPTX
K-means Clustering
Anna Fensel
 
PDF
Machine Learning Clustering
Rupak Roy
 
PPTX
K Nearest Neighbor Algorithm
Tharuka Vishwajith Sarathchandra
 
PPTX
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Simplilearn
 
PPTX
Knn 160904075605-converted
rameswara reddy venkat
 
PDF
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
PDF
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
PPTX
Convolution Neural Network (CNN)
Suraj Aavula
 
ODP
Machine Learning With Logistic Regression
Knoldus Inc.
 
PDF
Dimensionality Reduction
mrizwan969
 
K means Clustering Algorithm
Kasun Ranga Wijeweera
 
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Support Vector Machine ppt presentation
AyanaRukasar
 
Logistic regression in Machine Learning
Kuppusamy P
 
Naive Bayes
CloudxLab
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 
K-means Clustering
Anna Fensel
 
Machine Learning Clustering
Rupak Roy
 
K Nearest Neighbor Algorithm
Tharuka Vishwajith Sarathchandra
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Simplilearn
 
Knn 160904075605-converted
rameswara reddy venkat
 
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Convolution Neural Network (CNN)
Suraj Aavula
 
Machine Learning With Logistic Regression
Knoldus Inc.
 
Dimensionality Reduction
mrizwan969
 

Similar to K Means Clustering Algorithm | K Means Clustering Example | Machine Learning Algorithms |Simplilearn (20)

PPT
Lecture_3_k-mean-clustering.ppt
SyedNahin1
 
PPT
K mean-clustering
Afzaal Subhani
 
PPT
K mean clustering algorithm unsupervised learning
namansingh302004
 
PPT
k-mean-clustering big data analaysis.ppt
abikishor767
 
PPT
k-mean-clustering for data classification
KantilalRane1
 
PPTX
K means ALGORITHM IN MACHINE LEARNING.pptx
angelinjeba6
 
PPTX
Clustering
Md. Hasnat Shoheb
 
PPTX
AI-Lec20 Clustering I - Kmean.pptx
Syed Ejaz
 
PPT
Unsupervised Machine Learning, Clustering, K-Means
MomonLuffy
 
PPT
k-mean-Clustering impact on AI using DSS
MarkNaguibElAbd
 
PPT
k-mean-clustering algorithm with example.ppt
geethar79
 
PPT
k-mean-clustering.ppt
RanimeLoutar
 
PPT
06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt
ALiHassan443161
 
PPT
06K-means-clustering K-MEANS CLUSTERING.ppt
ALiHassan443161
 
PDF
Lecture_54.pdF k-MEANS cLUTERING BY NPTEL
bajrangenterprises19
 
PPT
k-mean-clustering (1) clustering topic explanation
my123lapto
 
PDF
k-mean-clustering.pdf
YatharthKhichar1
 
PPT
K mean-clustering
PVP College
 
PPT
Enhance The K Means Algorithm On Spatial Dataset
AlaaZ
 
PDF
Machine Learning with Python- Machine Learning Algorithms- K-Means Clustering...
KalighatOkira
 
Lecture_3_k-mean-clustering.ppt
SyedNahin1
 
K mean-clustering
Afzaal Subhani
 
K mean clustering algorithm unsupervised learning
namansingh302004
 
k-mean-clustering big data analaysis.ppt
abikishor767
 
k-mean-clustering for data classification
KantilalRane1
 
K means ALGORITHM IN MACHINE LEARNING.pptx
angelinjeba6
 
Clustering
Md. Hasnat Shoheb
 
AI-Lec20 Clustering I - Kmean.pptx
Syed Ejaz
 
Unsupervised Machine Learning, Clustering, K-Means
MomonLuffy
 
k-mean-Clustering impact on AI using DSS
MarkNaguibElAbd
 
k-mean-clustering algorithm with example.ppt
geethar79
 
k-mean-clustering.ppt
RanimeLoutar
 
06K-means-clusteringK-MEANS CLUSTERINGK-MEANS CLUSTERING.ppt
ALiHassan443161
 
06K-means-clustering K-MEANS CLUSTERING.ppt
ALiHassan443161
 
Lecture_54.pdF k-MEANS cLUTERING BY NPTEL
bajrangenterprises19
 
k-mean-clustering (1) clustering topic explanation
my123lapto
 
k-mean-clustering.pdf
YatharthKhichar1
 
K mean-clustering
PVP College
 
Enhance The K Means Algorithm On Spatial Dataset
AlaaZ
 
Machine Learning with Python- Machine Learning Algorithms- K-Means Clustering...
KalighatOkira
 
Ad

More from Simplilearn (20)

PPTX
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
PPTX
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
PPTX
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
PPTX
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
PPTX
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
PPTX
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
PPTX
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
PPTX
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
PPTX
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
PPTX
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
PPTX
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
PPTX
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
PPTX
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
PPTX
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
PPTX
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
PPTX
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
PPTX
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
PPTX
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Ad

Recently uploaded (20)

PPTX
How to Handle Salesperson Commision in Odoo 18 Sales
Celine George
 
PPTX
How to Create a PDF Report in Odoo 18 - Odoo Slides
Celine George
 
PPT
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
PPTX
How to Configure Re-Ordering From Portal in Odoo 18 Website
Celine George
 
PDF
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
PPT
Talk on Critical Theory, Part II, Philosophy of Social Sciences
Soraj Hongladarom
 
PPTX
How to Convert an Opportunity into a Quotation in Odoo 18 CRM
Celine George
 
PPTX
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
PDF
Isharyanti-2025-Cross Language Communication in Indonesian Language
Neny Isharyanti
 
PPTX
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
PPTX
EDUCATIONAL MEDIA/ TEACHING AUDIO VISUAL AIDS
Sonali Gupta
 
PPTX
care of patient with elimination needs.pptx
Rekhanjali Gupta
 
PDF
Geographical Diversity of India 100 Mcq.pdf/ 7th class new ncert /Social/Samy...
Sandeep Swamy
 
PDF
Geographical diversity of India short notes by sandeep swamy
Sandeep Swamy
 
PPTX
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
PPTX
PATIENT ASSIGNMENTS AND NURSING CARE RESPONSIBILITIES.pptx
PRADEEP ABOTHU
 
PPTX
How to Set Up Tags in Odoo 18 - Odoo Slides
Celine George
 
PPTX
Identifying elements in the story. Arrange the events in the story
geraldineamahido2
 
PPTX
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
PDF
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
How to Handle Salesperson Commision in Odoo 18 Sales
Celine George
 
How to Create a PDF Report in Odoo 18 - Odoo Slides
Celine George
 
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
How to Configure Re-Ordering From Portal in Odoo 18 Website
Celine George
 
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
Talk on Critical Theory, Part II, Philosophy of Social Sciences
Soraj Hongladarom
 
How to Convert an Opportunity into a Quotation in Odoo 18 CRM
Celine George
 
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
Isharyanti-2025-Cross Language Communication in Indonesian Language
Neny Isharyanti
 
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
EDUCATIONAL MEDIA/ TEACHING AUDIO VISUAL AIDS
Sonali Gupta
 
care of patient with elimination needs.pptx
Rekhanjali Gupta
 
Geographical Diversity of India 100 Mcq.pdf/ 7th class new ncert /Social/Samy...
Sandeep Swamy
 
Geographical diversity of India short notes by sandeep swamy
Sandeep Swamy
 
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
PATIENT ASSIGNMENTS AND NURSING CARE RESPONSIBILITIES.pptx
PRADEEP ABOTHU
 
How to Set Up Tags in Odoo 18 - Odoo Slides
Celine George
 
Identifying elements in the story. Arrange the events in the story
geraldineamahido2
 
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 

K Means Clustering Algorithm | K Means Clustering Example | Machine Learning Algorithms |Simplilearn

  • 2. What is K-Means Clustering? What is K-Means Clustering?
  • 3. What is K-Means Clustering? k-means performs division of objects into clusters which are “similar” between them and are “dissimilar” to the objects belonging to another cluster What is K-Means Clustering?
  • 4. What is K-Means Clustering? Can you explain this with an example?
  • 5. What is K-Means Clustering? Sure. For understanding K-Means in a better way, let’s take an example of Cricket Can you explain this with an example?
  • 6. What is K-Means Clustering? Task: Identify bowlers and batsmen
  • 7. What is K-Means Clustering? Task: Identify bowlers and batsmen  The data contains runs and wickets gained in the last 10 matches  So, the bowler will have more wickets and the batsmen will have higher runs Scores
  • 8. What is K-Means Clustering? Assign data points Here, we have our dataset with x and y coordinates Now, we want to cluster this data using K-Means Runs Wickets
  • 9. What is K-Means Clustering? Lorem ipsum Cluster 1Assign data points Lorem ipsum We can see that this cluster has players with high runs and low wickets Here, we have our dataset with x and y coordinates Now, we want to cluster this data using K-Means Runs Wickets Runs Wickets
  • 10. What is K-Means Clustering? And here, we can see that this cluster has players with high wickets and low wickets Lorem ipsum Cluster 1 Cluster 2Assign data points Lorem ipsumLorem ipsum We can see that this cluster has players with high runs and low wickets Here, we have our dataset with x and y coordinates Now, we want to cluster this data using K-Means Runs Wickets Runs Wickets Runs Wickets
  • 11. What is K-Means Clustering? Consider the same data set of cricket Solve the problem using K-Means
  • 12. What is K-Means Clustering? Initially, two centroids are assigned randomly Euclidean distance to find out which centroid is closest to each data point and the data points are assigned to the corresponding centroids
  • 13. What is K-Means Clustering? Reposition the two centroids for optimization.
  • 14. What is K-Means Clustering? The process is iteratively repeated until our centroids become static
  • 15. What’s in it for you? Types of Clustering What is K-Means Clustering? Applications of K-Means clustering Common distance measure How does K-Means clustering work? K-Means Clustering Algorithm Demo: K-Means Clustering Use Case: Color Compression
  • 17. Types of Clustering Clustering Hierarchical Clustering Agglomerative Divisive Partitional Clustering K-Means Fuzzy C-Means
  • 18. Types of Clustering Clustering Hierarchical Clustering Division Clusters have a tree like structure or a parent child relationship
  • 19. Types of Clustering Clustering Hierarchical Clustering Agglomerative Divisive a b c fd e debc def bcdef abcdef “Bottom up" approach: Begin with each element as a separate cluster and merge them into successively larger clusters
  • 20. Types of Clustering “Top down“ approach begin with the whole set and proceed to divide it into successively smaller clusters. a b c fd e de def bcdef abcdef bc Clustering Hierarchical Clustering Agglomerative Divisive
  • 21. Types of Clustering Clustering Partitional Clustering K-Means Fuzzy C-Means c1 c2 Division of objects into clusters such that each object is in exactly one cluster, not several
  • 22. Types of Clustering Clustering Partitional Clustering K-Means Fuzzy C-Means Division of objects into clusters such that each object can belong to multiple clusters c2c1
  • 24. Applications of K-Means Clustering Academic Performance Wireless Sensor Network's Diagnostic Systems Search Engines
  • 26. Distance Measure Euclidean distance measure Manhattan distance measure Squared Euclidean distance measure Cosine distance measure Distance measure will determine the similarity between two elements and it will influence the shape of the clusters
  • 27. Euclidean Distance Measure • The Euclidean distance is the "ordinary" straight line • It is the distance between two points in Euclidean space d=√ 𝑖=1 𝑛 ( 𝑞𝑖− )2 p q Euclidian Distance 𝑝𝑖 Option 02 Euclidean distance measure 01 Squared euclidean distance measure 02 Manhattan distance measure 03 Cosine distance measure 04
  • 28. Squared Euclidean Distance Measure The Euclidean squared distance metric uses the same equation as the Euclidean distance metric, but does not take the square root. d= 𝑖=1 𝑛 ( 𝑞𝑖− )2 𝑝𝑖 Option 02 Euclidean distance measure 01 Squared euclidean distance measure 02 Manhattan distance measure 03 Cosine distance measure 04
  • 29. Manhattan Distance Measure Option 02 Euclidean distance measure 01 Squared euclidean distance measure 02 Manhattan distance measure 03 Cosine distance measure 04 The Manhattan distance is the simple sum of the horizontal and vertical components or the distance between two points measured along axes at right angles d= 𝑖=1 𝑛 | 𝑞 𝑥− | p q Manhattan Distance 𝑝 𝑥 +|𝑞 𝑥− |𝑝 𝑦 (x,y) (x,y)
  • 30. Cosine Distance Measure Option 02 Euclidean distance measure 01 Squared euclidean distance measure 02 Manhattan distance measure 03 Cosine distance measure 04 The cosine distance similarity measures the angle between the two vectors p q Cosine Distance 𝑖=0 𝑛−1 𝑞𝑖− 𝑖=0 𝑛−1 (𝑞𝑖)2 × 𝑖=0 𝑛−1 (𝑝𝑖)2 d= 𝑝 𝑥
  • 31. How does K-Means clustering work?
  • 32. How does K-Means clustering work? Start Elbow point (k) Reposition the centroids Grouping based on minimum distance Measure the distance Convergence - + If clusters are stable If clusters are unstable
  • 33. How does K-Means clustering work? Elbow point Reposition the centroids Grouping Measure the distance Convergence • Let’s say, you have a dataset for a Grocery shop • Now, the important question is, “how would you choose the optimum number of clusters?“ ? c 1
  • 34. How does K-Means clustering work? Elbow point Reposition the centroids Grouping Measure the distance Convergence • The best way to do this is by Elbow method • The idea of the elbow method is to run K-Means clustering on the dataset where ‘k’ is referred as number of clusters • Within sum of squares (WSS) is defined as the sum of the squared distance between each member of the cluster and its centroid 𝑖=1 𝑚 )𝑥𝑖 2 WSS = ( Where x𝑖 = data point and c𝑖 = closest point to centroid − 𝑐𝑖
  • 35. How does K-Means clustering work? Elbow point Reposition the centroids Grouping Measure the distance Convergence • Now, we draw a curve between WSS (within sum of squares) and the number of clusters • Here, we can see a very slow change in the value of WSS after k=2, so you should take that elbow point value as the final number of clusters Elbow pointWSS No . of. clusters k=2
  • 36. How does K-Means clustering work? Elbow point Reposition the centroids Grouping Measure the distance Convergence Step 1: The given data points below are assumed as delivery points c1
  • 37. How does K-Means clustering work? Elbow point Reposition the centroids Grouping Measure the distance Convergence Step 2: We can randomly initialize two points called the cluster centroids, Euclidean distance is a distance measure used to find out which data point is closest to our centroids c1 c1 c2c 1 c2
  • 38. How does K-Means clustering work? Elbow point Reposition the centroids Grouping Measure the distance Convergence Step 3: Based upon the distance from c1 and c2 centroids, the data points will group itself into clusters c1 c1 c2c 1 c2
  • 39. How does K-Means clustering work? Elbow point Reposition the centroids Grouping Measure the distance Convergence Step 4: Compute the centroid of data points inside blue cluster Step 5: Reposition the centroid of the blue cluster to the new centroid c1 c1 c 1 c2
  • 40. How does K-Means clustering work? Elbow point Reposition the centroids Grouping Measure the distance Convergence Step 6: Now, compute the centroid of data points inside orange cluster Step 7: Reposition the centroid of the orange cluster to the new centroid c1 c1 c2 c 1 c2
  • 41. How does K-Means clustering work? Elbow point Reposition the centroids Grouping Measure the distance Convergence Step 8: Once the clusters become static, K-Means clustering algorithm is said to be converged c 1 c2
  • 43. K-Means Clustering Algorithm Assuming we have inputs x1,x2,x3,…, and value of K, Step 1 : Pick K random points as cluster centers called centroids Step 2 : Assign each xi to nearest cluster by calculating its distance to each centroid Step 3 : Find new cluster center by taking the average of the assigned points Step 4 : Repeat Step 2 and 3 until none of the cluster assignments change
  • 44. K-Means Clustering Algorithm Step 1 : We randomly pick K cluster centers (centroids). Let’s assume these are c1,c2,…,ckc1,c2,…,ck, and we can say that; C is the set of all centroids. Step 2: In this step, we assign each data point to closest center, this is done by calculating Euclidean distance arg min dist ( ,x )2 Where dist() is the Euclidean distance. 𝑐𝑖 ∈C𝑐𝑖 𝑐1 𝑐2 𝑐 𝑘C= , ,.…
  • 45. |𝑆𝑖| = 1 ∑ Step 3: In this step, we find the new centroid by taking the average of all the points assigned to that cluster. is the set of all points assigned to the i th cluster Step 4: In this step, we repeat step 2 and 3 until none of the cluster assignments change That means until our clusters remain stable, we repeat the algorithm xi∈Si 𝑐𝑖 𝑥𝑖 𝑠𝑖 K-Means Clustering Algorithm
  • 47. Demo: K-Means Clustering Problem Statement • Walmart wants to open a chain of stores across Florida and wants to find out optimal store locations to maximize revenue Solution • Walmart already has a strong e-commerce presence • Walmart can use its online customer data to analyze the customer locations along with the monthly sales
  • 48. Demo: K-Means Clustering %matplotlib inline import matplotlib.pyplot as plt # for plot styling import seaborn as sns; sns.set() import numpy as np from sklearn.datasets.samples_generator import make_blobs X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0) plt.scatter(X[:, 0], X[:, 1], s=50);
  • 50. Demo: K-Means Clustering # assign four clusters from sklearn.cluster import KMeans kmeans = KMeans(n_clusters=4) kmeans.fit(X) y_kmeans = kmeans.predict(X) # import library from sklearn.metrics import pairwise_distances_argmin def find_clusters(X, n_clusters, rseed=2): # 1. randomly choose clusters rng = np.random.RandomState(rseed) i = rng.permutation(X.shape[0])[:n_clusters] centers = X[i] while True:
  • 51. Demo: K-Means Clustering # 2. assign labels based on closest center labels = pairwise_distances_argmin(X, centers) # 3. find new centers from means of points new_centers = np.array([X[labels == i].mean(0) for i in range(n_clusters)]) centers, labels = find_clusters(X, 4) plt.scatter(X[:, 0], X[:, 1], c=labels, s=50, cmap='viridis’)
  • 52. Demo: K-Means Clustering # 4. check for convergence if np.all(centers == new_centers): break centers = new_centers return centers, labels centers, labels = find_clusters(X, 4) plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, s=50, cmap='viridis') plt.scatter(centers[:, 0], centers[:, 1], c='black', s=200, alpha=0.5);
  • 53. Demo: K-Means Clustering # output: Conclusion Congratulations! We have demonstrated K-Means clustering by establishing Walmart stores across Florida in the most optimized way
  • 54. Use case – Color compression
  • 55. Use Case: K-Means for Color Compression Problem Statement To perform color compression on images using K-Means algorithm
  • 56. Use Case: K-Means for Color Compression # example 1: # note: this requires the ``pillow`` package to be installed from sklearn.datasets import load_sample_image china = load_sample_image("flower.jpg") ax = plt.axes(xticks=[], yticks=[]) ax.imshow(china); #Output:
  • 57. Use Case: K-Means for Color Compression # returns the dimensions of the array china.shape # reshape the data to [n_samples x n_features], and rescale the colors so that they lie between 0 and 1 data = china / 255.0 # use 0...1 scale data = data.reshape(427 * 640, 3) data.shape # visualize these pixels in this color space, using a subset of 10,000 pixels for efficiency def plot_pixels(data, title, colors=None, N=10000): if colors is None: colors = data
  • 58. Use Case: K-Means for Color Compression # choose a random subset rng = np.random.RandomState(0) i = rng.permutation(data.shape[0])[:N] colors = colors[i] R, G, B = data[i].T fig, ax = plt.subplots(1, 2, figsize=(16, 6)) ax[0].scatter(R, G, color=colors, marker='.') ax[0].set(xlabel='Red', ylabel='Green', xlim=(0, 1), ylim=(0, 1)) ax[1].scatter(R, B, color=colors, marker='.') ax[1].set(xlabel='Red', ylabel='Blue', xlim=(0, 1), ylim=(0, 1)) fig.suptitle(title, size=20);
  • 59. Use Case: K-Means for Color Compression plot_pixels(data, title='Input color space: 16 million possible colors')
  • 60. Use Case: K-Means for Color Compression # fix numPy issues import warnings; warnings.simplefilter('ignore’) # reducing these 16 million colors to just 16 colors from sklearn.cluster import MiniBatchKMeans kmeans = MiniBatchKMeans(16) kmeans.fit(data) new_colors = kmeans.cluster_centers_[kmeans.predict(data)] plot_pixels(data, colors=new_colors, title="Reduced color space: 16 colors")
  • 61. Use Case: K-Means for Color Compression china_recolored = new_colors.reshape(china.shape) fig, ax = plt.subplots(1, 2, figsize=(16, 6), subplot_kw=dict(xticks=[], yticks=[])) fig.subplots_adjust(wspace=0.05) ax[0].imshow(china) ax[0].set_title('Original Image', size=16) ax[1].imshow(china_recolored) ax[1].set_title('16-color Image', size=16); # the result is re-coloring of the original pixels, where each pixel is assigned the color of its closest cluster center # output:
  • 62. Use Case: K-Means for Color Compression # output
  • 63. Use Case: K-Means for Color Compression # example 2: from sklearn.datasets import load_sample_image china = load_sample_image(“china.jpg") ax = plt.axes(xticks=[], yticks=[]) ax.imshow(china);
  • 64. Use Case: K-Means for Color Compression # returns the dimensions of the array china.shape # reshape the data to [n_samples x n_features], and rescale the colors so that they lie between 0 and 1 data = china / 255.0 # use 0...1 scale data = data.reshape(427 * 640, 3) data.shape # visualize these pixels in this color space, using a subset of 10,000 pixels for efficiency def plot_pixels(data, title, colors=None, N=10000): if colors is None: colors = data
  • 65. Use Case: K-Means for Color Compression # choose a random subset rng = np.random.RandomState(0) i = rng.permutation(data.shape[0])[:N] colors = colors[i] R, G, B = data[i].T fig, ax = plt.subplots(1, 2, figsize=(16, 6)) ax[0].scatter(R, G, color=colors, marker='.') ax[0].set(xlabel='Red', ylabel='Green', xlim=(0, 1), ylim=(0, 1)) ax[1].scatter(R, B, color=colors, marker='.') ax[1].set(xlabel='Red', ylabel='Blue', xlim=(0, 1), ylim=(0, 1)) fig.suptitle(title, size=20);
  • 66. Use Case: K-Means for Color Compression plot_pixels(data, title='Input color space: 16 million possible colors')
  • 67. Use Case: K-Means for Color Compression # fix NumPy issues import warnings; warnings.simplefilter('ignore’) # reducing these 16 million colors to just 16 colors from sklearn.cluster import MiniBatchKMeans kmeans = MiniBatchKMeans(16) kmeans.fit(data) new_colors = kmeans.cluster_centers_[kmeans.predict(data)] plot_pixels(data, colors=new_colors, title="Reduced color space: 16 colors")
  • 68. Use Case: K-Means for Color Compression china_recolored = new_colors.reshape(china.shape) fig, ax = plt.subplots(1, 2, figsize=(16, 6), subplot_kw=dict(xticks=[], yticks=[])) fig.subplots_adjust(wspace=0.05) ax[0].imshow(china) ax[0].set_title('Original Image', size=16) ax[1].imshow(china_recolored) ax[1].set_title('16-color Image', size=16); # the result is a re-coloring of the original pixels, where each pixel is assigned the color of its closest cluster center # output
  • 69. Use Case: K-Means for Color Compression # output Conclusion Congratulations! We have demonstrated K-Means in color compression. The hands on example will help you to encounter any K-Means project in future.

Editor's Notes