SlideShare a Scribd company logo
3
Most read
6
Most read
23
Most read
CH -10
Unsupervised Learning and Clustering
By:
Arshad Farhad
20177716
Contents
 Supervised vs Unsupervised learning
 Introduction to clustering
 K-means Clustering
 Hierarchical clustering
 Conclusion
Supervised Vs Unsupervised Learning
 Supervised learning is where you have input variables (x) and an output variable (Y)
and you use an algorithm to learn the mapping function from the input to the
output.
Y = f(X)
 The goal is to approximate the mapping function so well that when you have new
input data (x) that you can predict the output variables (Y) for that data
 Unsupervised learning is where you only have input data (X) and no corresponding
output variables
 The goal for unsupervised learning is to model the underlying structure or
distribution in the data in order to learn more about the data.
 Unsupervised learning problems can be further grouped into clustering and
association problems.
 Clustering
 Association
What is clustering?
• The organization of unlabeled data into similarity
groups called clusters.
• A cluster is a collection of data items which are “similar”
between them, and “dissimilar” to data items in other
clusters.
What do we need for clustering?
Distance (dissimilarity) measures
 Euclidean distance between points i and j is the length of the line segment
connecting them
 In Cartesian coordinates, if i = (i1, i2,…in) and q = (q1, q2,…qn) then the
distance (d) from i to j, or from j to i is given by:
Cluster Evaluation
• Intra-cluster cohesion (compactness):
– Cohesion measures how near the data points in a cluster
are to the cluster centroid.
– Sum of squared error (SSE) is a commonly used
measure.
• Inter-cluster separation (isolation):
– Separation means that different cluster centroids should
be far away from one another.
How many clusters?
Clustering Techniques
Clustering Techniques
Clustering Techniques
Divisive
K-means
K-Means clustering
• K-means (MacQueen, 1967) is a partitional clustering
algorithm
• The k-means algorithm partitions the given data into
k clusters:
– Each cluster has a cluster center, called centroid.
– k is specified by the user
K-means algorithm
• Given k, the k-means algorithm works as follows:
1. Choose k (random) data points (seeds) to be the initial
centroids, cluster centers
2. Assign each data point to the closest centroid
3. Re-compute the centroids using the current cluster
memberships
4. If a convergence criterion is not met, repeat steps 2 and 3
K-means clustering example: step 1
Choose k (random)
K-means clustering example – step 2
Assign each data point to the closest centroid
K-means clustering example – step 3
K-means clustering example
K-means clustering example
K-means clustering example
Why use K-means?
• Strengths:
– Simple: easy to understand and to implement
– Efficient: Time complexity: O(tkn),
– where n is the number of data points,
– k is the number of clusters, and
– t is the number of iterations.
– Since both k and t are small. k-means is considered a linear
algorithm.
• K-means is the most popular clustering algorithm.
• Note that: it terminates at a local optimum if SSE is used.
The global optimum is hard to find due to complexity.
Weaknesses of K-means
• The algorithm is only applicable if the mean is
defined.
– For categorical data, k-mode - the centroid is
represented by most frequent values.
• The user needs to specify k.
• The algorithm is sensitive to outliers
– Outliers are data points that are very far away
from other data points.
– Outliers could be errors in the data recording or so
me special data points with very different values.
K-means summary
• Despite weaknesses, k-means is still the most
popular algorithm due to its simplicity and ef
ficiency
• No clear evidence that any other clustering
algorithm performs better in general
• Comparing different clustering algorithms is a
difficult task. No one knows the correct clust
ers!
`Thank You!’

More Related Content

What's hot (20)

PPTX
Unsupervised learning
amalalhait
 
PDF
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
PPTX
Naive bayes
Ashraf Uddin
 
PPTX
Machine learning clustering
CosmoAIMS Bassett
 
PPT
K mean-clustering algorithm
parry prabhu
 
PPTX
Presentation on K-Means Clustering
Pabna University of Science & Technology
 
ODP
NAIVE BAYES CLASSIFIER
Knoldus Inc.
 
PPTX
Decision Tree Learning
Milind Gokhale
 
PDF
Naive Bayes
CloudxLab
 
PPTX
Naive Bayes
Abdullah al Mamun
 
PPTX
Ensemble methods in machine learning
SANTHOSH RAJA M G
 
PPTX
Autoencoders in Deep Learning
milad abbasi
 
PPTX
Support vector machines (svm)
Sharayu Patil
 
PPT
K means Clustering Algorithm
Kasun Ranga Wijeweera
 
PPTX
Concept learning
Musa Hawamdah
 
PDF
Decision trees in Machine Learning
Mohammad Junaid Khan
 
PPT
MachineLearning.ppt
butest
 
PDF
Machine Learning: Generative and Discriminative Models
butest
 
PPTX
Association Analysis in Data Mining
Kamal Acharya
 
PPT
3. mining frequent patterns
Azad public school
 
Unsupervised learning
amalalhait
 
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Naive bayes
Ashraf Uddin
 
Machine learning clustering
CosmoAIMS Bassett
 
K mean-clustering algorithm
parry prabhu
 
Presentation on K-Means Clustering
Pabna University of Science & Technology
 
NAIVE BAYES CLASSIFIER
Knoldus Inc.
 
Decision Tree Learning
Milind Gokhale
 
Naive Bayes
CloudxLab
 
Naive Bayes
Abdullah al Mamun
 
Ensemble methods in machine learning
SANTHOSH RAJA M G
 
Autoencoders in Deep Learning
milad abbasi
 
Support vector machines (svm)
Sharayu Patil
 
K means Clustering Algorithm
Kasun Ranga Wijeweera
 
Concept learning
Musa Hawamdah
 
Decision trees in Machine Learning
Mohammad Junaid Khan
 
MachineLearning.ppt
butest
 
Machine Learning: Generative and Discriminative Models
butest
 
Association Analysis in Data Mining
Kamal Acharya
 
3. mining frequent patterns
Azad public school
 

Similar to Unsupervised learning clustering (20)

PDF
MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf
1052LaxmanrajS
 
PDF
MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf
1052LaxmanrajS
 
PPTX
Ensemble_instance_unsupersied_learning 01_02_2024.pptx
vigneshmatta2004
 
PDF
K-means Clustering
Jidhu Mohan M
 
PPTX
K_means ppt in machine learning concepts
UdayNani14
 
PDF
Unsupervised learning and clustering.pdf
officialnovice7
 
PPT
15857 cse422 unsupervised-learning
Anil Yadav
 
PDF
Unsupervised Learning in Machine Learning
Pyingkodi Maran
 
DOCX
Neural nw k means
Eng. Dr. Dennis N. Mwighusa
 
PDF
PPT s10-machine vision-s2
Binus Online Learning
 
PDF
CSA 3702 machine learning module 3
Nandhini S
 
PDF
clustering using different methods in .pdf
officialnovice7
 
PPTX
Unsupervised Learning.pptx
GandhiMathy6
 
PPTX
machine learning - Clustering in R
Sudhakar Chavan
 
PPTX
Types of clustering and different types of clustering algorithms
Prashanth Guntal
 
PPT
26-Clustering MTech-2017.ppt
vikassingh569137
 
PPTX
K means Clustering - algorithm to cluster n objects
VoidVampire
 
PPT
K_MeansK_MeansK_MeansK_MeansK_MeansK_MeansK_Means.ppt
Nishant83346
 
PPTX
CLUSTER ANALYSIS ALGORITHMS.pptx
ShwetapadmaBabu1
 
PDF
Premeditated Initial Points for K-Means Clustering
IJCSIS Research Publications
 
MLT Unit4.pdfgmgkgmflbmrfmbrfmbfrmbofl;mb;lf
1052LaxmanrajS
 
MLT Unit4.pdffdhngnrfgrgrfflmbpmpphfhbomf
1052LaxmanrajS
 
Ensemble_instance_unsupersied_learning 01_02_2024.pptx
vigneshmatta2004
 
K-means Clustering
Jidhu Mohan M
 
K_means ppt in machine learning concepts
UdayNani14
 
Unsupervised learning and clustering.pdf
officialnovice7
 
15857 cse422 unsupervised-learning
Anil Yadav
 
Unsupervised Learning in Machine Learning
Pyingkodi Maran
 
Neural nw k means
Eng. Dr. Dennis N. Mwighusa
 
PPT s10-machine vision-s2
Binus Online Learning
 
CSA 3702 machine learning module 3
Nandhini S
 
clustering using different methods in .pdf
officialnovice7
 
Unsupervised Learning.pptx
GandhiMathy6
 
machine learning - Clustering in R
Sudhakar Chavan
 
Types of clustering and different types of clustering algorithms
Prashanth Guntal
 
26-Clustering MTech-2017.ppt
vikassingh569137
 
K means Clustering - algorithm to cluster n objects
VoidVampire
 
K_MeansK_MeansK_MeansK_MeansK_MeansK_MeansK_Means.ppt
Nishant83346
 
CLUSTER ANALYSIS ALGORITHMS.pptx
ShwetapadmaBabu1
 
Premeditated Initial Points for K-Means Clustering
IJCSIS Research Publications
 
Ad

Recently uploaded (20)

PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PPTX
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
PDF
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Ad

Unsupervised learning clustering

  • 1. CH -10 Unsupervised Learning and Clustering By: Arshad Farhad 20177716
  • 2. Contents  Supervised vs Unsupervised learning  Introduction to clustering  K-means Clustering  Hierarchical clustering  Conclusion
  • 3. Supervised Vs Unsupervised Learning  Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output. Y = f(X)  The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data  Unsupervised learning is where you only have input data (X) and no corresponding output variables  The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.  Unsupervised learning problems can be further grouped into clustering and association problems.  Clustering  Association
  • 4. What is clustering? • The organization of unlabeled data into similarity groups called clusters. • A cluster is a collection of data items which are “similar” between them, and “dissimilar” to data items in other clusters.
  • 5. What do we need for clustering?
  • 6. Distance (dissimilarity) measures  Euclidean distance between points i and j is the length of the line segment connecting them  In Cartesian coordinates, if i = (i1, i2,…in) and q = (q1, q2,…qn) then the distance (d) from i to j, or from j to i is given by:
  • 7. Cluster Evaluation • Intra-cluster cohesion (compactness): – Cohesion measures how near the data points in a cluster are to the cluster centroid. – Sum of squared error (SSE) is a commonly used measure. • Inter-cluster separation (isolation): – Separation means that different cluster centroids should be far away from one another.
  • 12. K-Means clustering • K-means (MacQueen, 1967) is a partitional clustering algorithm • The k-means algorithm partitions the given data into k clusters: – Each cluster has a cluster center, called centroid. – k is specified by the user
  • 13. K-means algorithm • Given k, the k-means algorithm works as follows: 1. Choose k (random) data points (seeds) to be the initial centroids, cluster centers 2. Assign each data point to the closest centroid 3. Re-compute the centroids using the current cluster memberships 4. If a convergence criterion is not met, repeat steps 2 and 3
  • 14. K-means clustering example: step 1 Choose k (random)
  • 15. K-means clustering example – step 2 Assign each data point to the closest centroid
  • 20. Why use K-means? • Strengths: – Simple: easy to understand and to implement – Efficient: Time complexity: O(tkn), – where n is the number of data points, – k is the number of clusters, and – t is the number of iterations. – Since both k and t are small. k-means is considered a linear algorithm. • K-means is the most popular clustering algorithm. • Note that: it terminates at a local optimum if SSE is used. The global optimum is hard to find due to complexity.
  • 21. Weaknesses of K-means • The algorithm is only applicable if the mean is defined. – For categorical data, k-mode - the centroid is represented by most frequent values. • The user needs to specify k. • The algorithm is sensitive to outliers – Outliers are data points that are very far away from other data points. – Outliers could be errors in the data recording or so me special data points with very different values.
  • 22. K-means summary • Despite weaknesses, k-means is still the most popular algorithm due to its simplicity and ef ficiency • No clear evidence that any other clustering algorithm performs better in general • Comparing different clustering algorithms is a difficult task. No one knows the correct clust ers!