SlideShare a Scribd company logo
© 2007 Prentice Hall 20-1
Chapter Outline
1) Overview
2) Basic Concept
3) Statistics Associated with Cluster Analysis
4) Conducting Cluster Analysis
i. Formulating the Problem
ii. Selecting a Distance or Similarity Measure
iii. Selecting a Clustering Procedure
iv. Deciding on the Number of Clusters
v. Interpreting and Profiling the Clusters
vi. Assessing Reliability and Validity
© 2007 Prentice Hall 20-2
Statistics Associated with Cluster Analysis
 Agglomeration schedule. An agglomeration schedule
gives information on the objects or cases being combined
at each stage of a hierarchical clustering process.
 Cluster centroid. The cluster centroid is the mean values
of the variables for all the cases or objects in a particular
cluster.
 Cluster centers. The cluster centers are the initial
starting points in nonhierarchical clustering. Clusters are
built around these centers, or seeds.
 Cluster membership. Cluster membership indicates the
cluster to which each object or case belongs.
© 2007 Prentice Hall 20-3
Statistics Associated with Cluster Analysis
 Dendrogram. A dendrogram, or tree graph, is a
graphical device for displaying clustering results.
Vertical lines represent clusters that are joined
together. The position of the line on the scale
indicates the distances at which clusters were joined.
The dendrogram is read from left to right. Figure
20.8 is a dendrogram.
 Distances between cluster centers. These
distances indicate how separated the individual pairs
of clusters are. Clusters that are widely separated
are distinct, and therefore desirable.
© 2007 Prentice Hall 20-4
Statistics Associated with Cluster Analysis
 Icicle diagram. An icicle diagram is a graphical
display of clustering results, so called because it
resembles a row of icicles hanging from the eaves of
a house. The columns correspond to the objects
being clustered, and the rows correspond to the
number of clusters. An icicle diagram is read from
bottom to top. Figure 20.7 is an icicle diagram.
 Similarity/distance coefficient matrix. A
similarity/distance coefficient matrix is a lower-
triangle matrix containing pairwise distances between
objects or cases.
© 2007 Prentice Hall 20-5
Conducting Cluster Analysis
Formulate the Problem
Assess the Validity of Clustering
Select a Distance Measure
Select a Clustering Procedure
Decide on the Number of Clusters
Interpret and Profile Clusters
Fig. 20.3
© 2007 Prentice Hall 20-6
Attitudinal Data For Clustering
Case No. V1 V2 V3 V4 V5 V6
1 6 4 7 3 2 3
2 2 3 1 4 5 4
3 7 2 6 4 1 3
4 4 6 4 5 3 6
5 1 3 2 2 6 4
6 6 4 6 3 3 4
7 5 3 6 3 3 4
8 7 3 7 4 1 4
9 2 4 3 3 6 3
10 3 5 3 6 4 6
11 1 3 2 3 5 3
12 5 4 5 4 2 4
13 2 2 1 5 4 4
14 4 6 4 6 4 7
15 6 5 4 2 1 4
16 3 5 4 6 4 7
17 4 4 7 2 2 5
18 3 7 2 6 4 3
19 4 6 3 7 2 7
20 2 3 2 4 7
Table 20.1
© 2007 Prentice Hall 20-7
Conducting Cluster Analysis
Formulate the Problem
 Perhaps the most important part of formulating the
clustering problem is selecting the variables on which
the clustering is based.
 Inclusion of even one or two irrelevant variables may
distort an otherwise useful clustering solution.
 Basically, the set of variables selected should describe
the similarity between objects in terms that are
relevant to the marketing research problem.
 The variables should be selected based on past
research, theory, or a consideration of the hypotheses
being tested. In exploratory research, the researcher
should exercise judgment and intuition.
© 2007 Prentice Hall 20-8
Conducting Cluster Analysis
Select a Distance or Similarity Measure
 The most commonly used measure of similarity is the Euclidean
distance or its square. The Euclidean distance is the square
root of the sum of the squared differences in values for each
variable. Other distance measures are also available. The city-
block or Manhattan distance between two objects is the sum of
the absolute differences in values for each variable. The
Chebychev distance between two objects is the maximum
absolute difference in values for any variable.
 If the variables are measured in vastly different units, the
clustering solution will be influenced by the units of
measurement. In these cases, before clustering respondents,
we must standardize the data by rescaling each variable to have
a mean of zero and a standard deviation of unity. It is also
desirable to eliminate outliers (cases with atypical values).
 Use of different distance measures may lead to different
clustering results. Hence, it is advisable to use different
measures and compare the results.

More Related Content

Similar to clustering in research cluster analysis.ppt (20)

PPTX
Read first few slides cluster analysis
Kritika Jain
 
PDF
Clustering techniques
Learnbay Datascience
 
PPT
DM_clustering.ppt
nandhini manoharan
 
PPTX
Data mining Techniques
Sulman Ahmed
 
PPTX
Cluster analysis
Avijit Famous
 
PPT
Cluster
H9460730008
 
PPT
4 DM Clustering ifor computerscience.ppt
arewho557
 
PPTX
Cluster Analysis
Baivab Nag
 
PDF
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD Editor
 
PPTX
Hierarchical clustering
ishmecse13
 
PPT
26-Clustering MTech-2017.ppt
vikassingh569137
 
PPT
Cluster spss week7
Birat Sharma
 
PDF
Clustering Algorithms - Kmeans,Min ALgorithm
Sharmila Chidaravalli
 
PPTX
Clusters (4).pptx
brahimNasibov
 
PPTX
Cluster Analysis in Business Research Methods
ufkconsumerproducts
 
PDF
ch_5_dm clustering in data mining.......
PriyankaPatil919748
 
PDF
ClusteringClusteringClusteringClustering.pdf
SsdSsd5
 
PPTX
Cluster Analysis.pptx
AdityaRajput317826
 
PDF
Bs31267274
IJMER
 
PPT
3.1 clustering
Krish_ver2
 
Read first few slides cluster analysis
Kritika Jain
 
Clustering techniques
Learnbay Datascience
 
DM_clustering.ppt
nandhini manoharan
 
Data mining Techniques
Sulman Ahmed
 
Cluster analysis
Avijit Famous
 
Cluster
H9460730008
 
4 DM Clustering ifor computerscience.ppt
arewho557
 
Cluster Analysis
Baivab Nag
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD Editor
 
Hierarchical clustering
ishmecse13
 
26-Clustering MTech-2017.ppt
vikassingh569137
 
Cluster spss week7
Birat Sharma
 
Clustering Algorithms - Kmeans,Min ALgorithm
Sharmila Chidaravalli
 
Clusters (4).pptx
brahimNasibov
 
Cluster Analysis in Business Research Methods
ufkconsumerproducts
 
ch_5_dm clustering in data mining.......
PriyankaPatil919748
 
ClusteringClusteringClusteringClustering.pdf
SsdSsd5
 
Cluster Analysis.pptx
AdityaRajput317826
 
Bs31267274
IJMER
 
3.1 clustering
Krish_ver2
 

More from ssuserb9efd7 (20)

PPTX
ethical aspects of research in business.pptx
ssuserb9efd7
 
PPT
intro to research and its process brm.ppt
ssuserb9efd7
 
PPTX
agriculture mkting and its functionariespptx
ssuserb9efd7
 
PPTX
dbms ms access basics and introduction to ms access
ssuserb9efd7
 
PPT
systemdevelopmentmethodologies-160803075401.ppt
ssuserb9efd7
 
PPTX
communication IN MANAGEMENT INFORMATION SYSTEM
ssuserb9efd7
 
PPTX
datacommunication-labay-160923034228.pptx
ssuserb9efd7
 
PPT
2-presentationsmalhotraorgnlppt01-140523022714-phpapp01 (1).ppt
ssuserb9efd7
 
PPT
basics of management information system.
ssuserb9efd7
 
PPTX
hypothesis in research .......................
ssuserb9efd7
 
PPTX
Tax_treatment_of_foreign_exchange_gains_and_losses[1].pptx
ssuserb9efd7
 
PPT
tabulation and cross tabulation: data processsing
ssuserb9efd7
 
PPTX
Communication in principles of management
ssuserb9efd7
 
PPTX
capitalisation financial management fm
ssuserb9efd7
 
PPTX
Early Advocates of Organisational Behaviour and hawthorne studies.pptx
ssuserb9efd7
 
PPTX
types and concept of experimental research design .pptx
ssuserb9efd7
 
PDF
chapter 1.pdf
ssuserb9efd7
 
PDF
Organisational Design.pdf
ssuserb9efd7
 
PPTX
new ppt leadership issues.pptx
ssuserb9efd7
 
PPTX
ch26 aakar david.pptx
ssuserb9efd7
 
ethical aspects of research in business.pptx
ssuserb9efd7
 
intro to research and its process brm.ppt
ssuserb9efd7
 
agriculture mkting and its functionariespptx
ssuserb9efd7
 
dbms ms access basics and introduction to ms access
ssuserb9efd7
 
systemdevelopmentmethodologies-160803075401.ppt
ssuserb9efd7
 
communication IN MANAGEMENT INFORMATION SYSTEM
ssuserb9efd7
 
datacommunication-labay-160923034228.pptx
ssuserb9efd7
 
2-presentationsmalhotraorgnlppt01-140523022714-phpapp01 (1).ppt
ssuserb9efd7
 
basics of management information system.
ssuserb9efd7
 
hypothesis in research .......................
ssuserb9efd7
 
Tax_treatment_of_foreign_exchange_gains_and_losses[1].pptx
ssuserb9efd7
 
tabulation and cross tabulation: data processsing
ssuserb9efd7
 
Communication in principles of management
ssuserb9efd7
 
capitalisation financial management fm
ssuserb9efd7
 
Early Advocates of Organisational Behaviour and hawthorne studies.pptx
ssuserb9efd7
 
types and concept of experimental research design .pptx
ssuserb9efd7
 
chapter 1.pdf
ssuserb9efd7
 
Organisational Design.pdf
ssuserb9efd7
 
new ppt leadership issues.pptx
ssuserb9efd7
 
ch26 aakar david.pptx
ssuserb9efd7
 
Ad

Recently uploaded (20)

PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PDF
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PPT
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
Data Retrieval and Preparation Business Analytics.pdf
kayserrakib80
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
Ad

clustering in research cluster analysis.ppt

  • 1. © 2007 Prentice Hall 20-1 Chapter Outline 1) Overview 2) Basic Concept 3) Statistics Associated with Cluster Analysis 4) Conducting Cluster Analysis i. Formulating the Problem ii. Selecting a Distance or Similarity Measure iii. Selecting a Clustering Procedure iv. Deciding on the Number of Clusters v. Interpreting and Profiling the Clusters vi. Assessing Reliability and Validity
  • 2. © 2007 Prentice Hall 20-2 Statistics Associated with Cluster Analysis  Agglomeration schedule. An agglomeration schedule gives information on the objects or cases being combined at each stage of a hierarchical clustering process.  Cluster centroid. The cluster centroid is the mean values of the variables for all the cases or objects in a particular cluster.  Cluster centers. The cluster centers are the initial starting points in nonhierarchical clustering. Clusters are built around these centers, or seeds.  Cluster membership. Cluster membership indicates the cluster to which each object or case belongs.
  • 3. © 2007 Prentice Hall 20-3 Statistics Associated with Cluster Analysis  Dendrogram. A dendrogram, or tree graph, is a graphical device for displaying clustering results. Vertical lines represent clusters that are joined together. The position of the line on the scale indicates the distances at which clusters were joined. The dendrogram is read from left to right. Figure 20.8 is a dendrogram.  Distances between cluster centers. These distances indicate how separated the individual pairs of clusters are. Clusters that are widely separated are distinct, and therefore desirable.
  • 4. © 2007 Prentice Hall 20-4 Statistics Associated with Cluster Analysis  Icicle diagram. An icicle diagram is a graphical display of clustering results, so called because it resembles a row of icicles hanging from the eaves of a house. The columns correspond to the objects being clustered, and the rows correspond to the number of clusters. An icicle diagram is read from bottom to top. Figure 20.7 is an icicle diagram.  Similarity/distance coefficient matrix. A similarity/distance coefficient matrix is a lower- triangle matrix containing pairwise distances between objects or cases.
  • 5. © 2007 Prentice Hall 20-5 Conducting Cluster Analysis Formulate the Problem Assess the Validity of Clustering Select a Distance Measure Select a Clustering Procedure Decide on the Number of Clusters Interpret and Profile Clusters Fig. 20.3
  • 6. © 2007 Prentice Hall 20-6 Attitudinal Data For Clustering Case No. V1 V2 V3 V4 V5 V6 1 6 4 7 3 2 3 2 2 3 1 4 5 4 3 7 2 6 4 1 3 4 4 6 4 5 3 6 5 1 3 2 2 6 4 6 6 4 6 3 3 4 7 5 3 6 3 3 4 8 7 3 7 4 1 4 9 2 4 3 3 6 3 10 3 5 3 6 4 6 11 1 3 2 3 5 3 12 5 4 5 4 2 4 13 2 2 1 5 4 4 14 4 6 4 6 4 7 15 6 5 4 2 1 4 16 3 5 4 6 4 7 17 4 4 7 2 2 5 18 3 7 2 6 4 3 19 4 6 3 7 2 7 20 2 3 2 4 7 Table 20.1
  • 7. © 2007 Prentice Hall 20-7 Conducting Cluster Analysis Formulate the Problem  Perhaps the most important part of formulating the clustering problem is selecting the variables on which the clustering is based.  Inclusion of even one or two irrelevant variables may distort an otherwise useful clustering solution.  Basically, the set of variables selected should describe the similarity between objects in terms that are relevant to the marketing research problem.  The variables should be selected based on past research, theory, or a consideration of the hypotheses being tested. In exploratory research, the researcher should exercise judgment and intuition.
  • 8. © 2007 Prentice Hall 20-8 Conducting Cluster Analysis Select a Distance or Similarity Measure  The most commonly used measure of similarity is the Euclidean distance or its square. The Euclidean distance is the square root of the sum of the squared differences in values for each variable. Other distance measures are also available. The city- block or Manhattan distance between two objects is the sum of the absolute differences in values for each variable. The Chebychev distance between two objects is the maximum absolute difference in values for any variable.  If the variables are measured in vastly different units, the clustering solution will be influenced by the units of measurement. In these cases, before clustering respondents, we must standardize the data by rescaling each variable to have a mean of zero and a standard deviation of unity. It is also desirable to eliminate outliers (cases with atypical values).  Use of different distance measures may lead to different clustering results. Hence, it is advisable to use different measures and compare the results.