SlideShare a Scribd company logo
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME

TECHNOLOGY (IJCET)

ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 6, November - December (2013), pp. 78-82
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com

IJCET
©IAEME

DIVISIVE HIERARCHICAL CLUSTERING USING PARTITIONING
METHODS
Megha Gupta
M.Tech Scholar, Computer Science & Engineering,
Arya College of Engg & IT
Jaipur, Rajasthan, India
Vishal Shrivastava
Professor, Computer Science & Engineering
Arya College of Engg & IT
Jaipur, Rajasthan, India

ABSTRACT
Clustering is the process of partitioning a set of data so that the data can be divided into
subsets. Clustering is implemented so that same set of data can be collected on one side and other set
of data can be collected on the other end. Clustering can be done using many methods like
partitioning methods, hierarchical methods, density based method. Hierarchical method creates a
hierarchical decomposition of the given set of data objects. In successive iteration, a cluster is split
into smaller clusters, until eventually each object is in one cluster, or a termination condition holds.
In this paper, partitioning method has been used with hierarchical method to form better and
improved clusters. We have used various algorithms for getting better and improved clusters.
Keywords: Clustering, Hierarchical, Partitioning methods.
I. INTRODUCTION
Data Mining, also popularly known as Knowledge Discovery in Databases (KDD), refers to
the nontrivial extraction of implicit, previously unknown and potentially useful information from
data in databases. While data mining and knowledge discovery in databases (or KDD) are frequently
treated as synonyms, data mining is actually part of the knowledge discovery process.

78
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME

The Knowledge Discovery in Databases process comprises of a few steps leading from raw
data collections to [1] some form of new knowledge. The iterative process consists of the following
steps:
Data cleaning: also known as data cleansing, it is a phase in which noise data and irrelevant data
are removed from the collection.
• Data integration: at this stage, multiple data sources, often heterogeneous, may be combined in a
common source.
• Data selection: at this step, the data relevant to the analysis is decided on and retrieved from the
data collection.
• Data transformation: also known as data consolidation, it is a phase in which the selected data is
transformed into forms appropriate for the mining procedure.
• Data mining: it is the crucial step in which clever techniques are applied to extract patterns
potentially useful.
• Pattern evaluation: in this step, strictly interesting patterns representing knowledge are identified
based on given measures.
• Knowledge representation: is the final phase in which the discovered knowledge is visually
represented to the user. This essential step uses visualization techniques to help users understand
and interpret the data mining results.
Clustering is the organization of data in classes. However, unlike classification, in clustering,
class labels are unknown and it is up to the clustering algorithm to discover acceptable classes.
Clustering is also called unsupervised classification, because the classification is not dictated by
given class labels. There are many clustering approaches all based on the principle of maximizing the
similarity between objects in a same class (intra-class similarity) and minimizing the similarity
between objects of different classes (inter-class similarity) [2].
•

II. RELATED WORK
Hierarchical Clustering for Data-mining
Hierarchical methods for supervised and unsupervised data mining give multilevel
description of data. It is relevant for many applications related to information extraction, retrieval
navigation and organizations. The two interpretation techniques have been used for description of the
clusters:
1. Listing of prototypical data examples from the cluster.
2. Listing of typical features associated with the cluster.
The Generalizable Gaussian Mixture model (GGM) and the soft Generalizable Gaussian
mixture model (SGGM) are addressed for supervised and unsupervised learning. These two models
help in calculating parameters of the Gaussian clusters with a modified EM procedure from two
disjoint sets of observation that helps in ensuring high generalization ability [3].
Procedure
The agglomerative clustering scheme is started by k clusters at level j=1, as given by the
optimized GGM model of p(x). At each higher level in the hierarchy, two clusters are merged based
on a similarity measure between pairs of clusters. The procedure is repeated till the top level. That is,
at level j=1, there are k clusters and 1 cluster at the final level, j=2k-1. The natural distance measure
between the cluster densities is the Kullback- Leibler (KL) divergence, since it reflects dissimilarity
between the densities in the probabilistic space.

79
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME

Limitations
The drawback is KL only obtains an analytical expression for first level in the hierarchy
while distances for the subsequently levels have to be approximated.
Automatically Labeling Hierarchical Clustering
A simple algorithm has been used that automatically assigns labels to hierarchical clusters.
The algorithm evaluates candidates labels using information from cluster, parent cluster and corpus
statistics. A trainable threshold enables the algorithm to assign just a few high- quality labels to each
cluster. First, it is assumed that algorithm has access to a general collection of documents E,
representing the word distribution in general English. This English corpus is used in selecting label
candidates [4].
Procedure
Given a cluster S and its parent cluster P, which includes all of documents in S and in sibling
clusters of S, the algorithm selects labels for the cluster with following steps:
1. Collect phrase statistics: For every unigram, bigram and tri-gram phrase p occurring in the
cluster S, calculate the document frequency and term frequency statistics for the cluster, the
parent cluster and the general English corpus.
2. Select label candidates: Select the label candidates from unigram, bigram, tri-gram phrases
based on document frequency in the cluster and in general English language.
3. Calculate the descriptive score: Calculate the descriptive score for each label candidate, and then
sort the label candidates by these scores.
4. Calculate the cutoff point: Decide how many label candidate to display based on the descriptive
scores.
Limitation
The most errors come from clusters containing small numbers of documents. The small
number of observations in small clusters can be good and bad labels indistinguishable; minor
variations in vocabulary can also produce statistical feature with high variance.
Fast Hierarchical Clustering and other applications of dynamic closet pairs
The data structures for dynamic closet pair problems with arbitrary distance functions, based
on a technique used for Euclidean closet pairs. This paper show how to insert or delete object from
n-object set, maintaining the closet pair, O (n log2n) time per update and O (n) space. The purpose of
this paper is to show that much better bounds are possible, using data structures that are simple. If
linear space is required, this represents an order –of-magnitude improved.
Procedure
The data structure consists of a partition of the dynamic set S into k ≤ log n subsets S1,
S2….Sk, together with a digraph Gi for each set Si. Initially all points will be in S1 and G1 will have n1 edges. Gi may contain edges with neither endpoint in Si; if the number of edges in all graphs grows
to 2n then the data structure will be rebuild by moving all points to S1 and recomputing G1. The
closet pair will be represented by an edge in some Gi, so the pair can be found by scanning the edges
in all graphs [5].
Create Gi for a new partition Si. Initially, Gi will consist of a single path. Choose the first
vertex of the path to be any object in Si. Then, extend the path one edge at a time. When last vertex
in the path P is in Si, choose the next vertex to be its nearest neighbor in S  P, and when the last
vertex is in S  Si, choose the next vertex to be its nearest neighbor in Si  P. Continue until the path
can no longer be extended because S  P or Si  P is empty.
80
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME

Merge Partitions: The update operations can cause k to be too large relative to n. If so, choose
subsets Si and Sj as close to equal in size as |Si| ≤ |Sj| and |Sj| / |Si| minimized. Merge these two subsets
into one and create graph Gi for the merged subset.
To insert x, create a new subset Sk+1= {x} in the partition of S, create Gk+1, and merge
partitions as necessary until k ≤ log n.
To delete x, create a new subset Sk+1 consisting of all objects y such that (y, x) is a directed
edge in some Gi. Remove x and all its adjacent edges from all the graphs Gi. Create the graph Gk+1
for Sk+1, and merge partitions as necessary until k ≤ log n.
Theorem: The data structure above maintains the closet pair in S in O (n) space, amortized time O (n
log n) per insertion, and amortized time O (n log2 n) per deletion.
Limitations
The methods that are tested involve sequential scans through memory, a behavior known to
reduce the effectiveness of cached memory.
Motivations
Using hierarchical clustering, better clusters will be formed. The clusters formed will appear
in better way and there will be tight bonding in between them. It means that the clusters formed will
be refined using the various algorithm of hierarchical clustering.
III. PROBLEM STATEMENT
The objective of the proposed work is to perform hierarchical clustering to obtain the more
refined clusters with strong relationship between members of same cluster.
IV. PROPOSED APPROACH
In this paper, we have used K-means algorithm and CFNG to find better and improved clusters.
K-means Algorithm
Suppose a data set, D, contains n objects in Euclidean space. Partitioning methods distribute
the objects into k clusters, Ci…..Ck, that is, Ci ⊂ D and Ci ∩ Cj=Ø for (1≤ i, j≤k). An objective
function is used to access the partitioning quality so that objects within a cluster are similar to one
another but dissimilar to objects in other clusters. This is, the objective function aims for high
intracluster similarity and low intercluster similarity [6].
A centroid-based partitioning technique uses the centroid of a cluster, Ci, to represent that
cluster. The centroid of a cluster is its center point. The centroid can be defined in various ways such
as by the mean or medoids of the objects assigned to the cluster. The difference between an object p
and Ci, the representation of the cluster, is measured by dist(p,Ci), where dist(x,y) is the Euclidean
distance between two points x and y.
CFNG
Colored farthest neighbor graph shares many characteristics with SFN (shared farthest
neighbors) by Rovetta and Masulli [7]. This algorithm yields binary partitions of objects into subsets,
whereas number of subsets obtained by SFN can vary. The SFN algorithm can easily split a cluster
where no natural partition is necessary, while the CFNG often avoids such splits.

81
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976
0976-6367(Print),
ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME
6

V. RESULTS
To observe the effect of hierarchical clustering, k-means and CFNG algorithm are used and
k means
to observe the results, the experimental setup was designed using Java, MySQL. The obtained results
are compared with K-means and CFNG, when executed individually.
means

Figure 1: Comparison of Proposed Algorithm with K-means and CFNG
K means
VI. CONCLUSION AND FUTURE SCOPE
We have obtained better and improved clusters us
using K-means and CFNG algorithms
means
hierarchically. The final clusters obtained are tightly bonded with each other.
In this paper, we have used 2 different algorithms for hierarchical clustering. Instead of using
clustering
CFNG, we could have used other hierarchical clustering algorithm.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]

Osmar R. Zaïane,” Principles of Knowledge discovery in databases”, 1999.
Jiawei Han, Micheline Kamber, Jian Pei, “Data Mining Concepts and Techniques”.
Osmar R. Zaïane,” Principles of Knowledge discovery in databases”, 1999.
Pucktada Treeratpituk, Jamie Callon,” Automatically Labeling Hierarchical Clusters”.
David Eppstein, “Fast Hierarchical Clustering and other applications of dynamic closet pairs”.
G.Plaxton, Approximation Algorithms for Hierarchical Location problems. Proceedings of the
35th ACM Symposium on the Theory of Computation, 2003.
[7] A.Borodin, R.Ostrovsky & Y. Rabani. Subquadratic approximation algorithm for clustering
odin,
problem in high dimensional spaces. Proceedings of 31st ACM Symposium on Theory of
Computation, 1999.
[8] Rinal H. Doshi, Dr. Harshad B. Bhadka and Richa Mehta, “Development of Pattern
Development
Knowledge Discovery Framework using Clustering Data Mining Algorithm”, International
ta
Algorithm
Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3, 2013,
,
pp. 101 - 112, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
,
[9] Deepika Khurana and Dr. M.P.S Bhatia, “Dynamic Approach to K-Means Clustering
nd
Bhatia
Means
Algorithm”, International Journal of Computer Engineering & Technology (IJCET),
Volume 4, Issue 3, 2013, pp. 204 - 219, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.
[10] Meghana. N.Ingole, M.S.Bewoor, S.H.Patil,, “Context Sensitive Text Summarization using
Context
Hierarchical Clustering Algorithm”, International Journal of Computer Engineering &
Algorithm
Technology (IJCET), Volume 3, Issue 1, 2012, pp. 322 - 329, ISSN Print: 0976 – 6367, ISSN
,
Online: 0976 – 6375.
82

More Related Content

PDF
Scalable and efficient cluster based framework for
eSAT Publishing House
 
PDF
Scalable and efficient cluster based framework for multidimensional indexing
eSAT Journals
 
PDF
Dynamic approach to k means clustering algorithm-2
IAEME Publication
 
PDF
50120130406022
IAEME Publication
 
PDF
Top-K Dominating Queries on Incomplete Data with Priorities
ijtsrd
 
PDF
K-means Clustering Method for the Analysis of Log Data
idescitation
 
PDF
Parallel KNN for Big Data using Adaptive Indexing
IRJET Journal
 
PDF
Cg33504508
IJERA Editor
 
Scalable and efficient cluster based framework for
eSAT Publishing House
 
Scalable and efficient cluster based framework for multidimensional indexing
eSAT Journals
 
Dynamic approach to k means clustering algorithm-2
IAEME Publication
 
50120130406022
IAEME Publication
 
Top-K Dominating Queries on Incomplete Data with Priorities
ijtsrd
 
K-means Clustering Method for the Analysis of Log Data
idescitation
 
Parallel KNN for Big Data using Adaptive Indexing
IRJET Journal
 
Cg33504508
IJERA Editor
 

What's hot (20)

PDF
Premeditated Initial Points for K-Means Clustering
IJCSIS Research Publications
 
PDF
A046010107
IJERA Editor
 
PDF
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
IJMER
 
PDF
Comparative Analysis of Algorithms for Single Source Shortest Path Problem
CSCJournals
 
PDF
Af4201214217
IJERA Editor
 
PDF
A frame work for clustering time evolving data
iaemedu
 
PDF
A Novel Approach for Clustering Big Data based on MapReduce
IJECEIAES
 
PDF
Big data Clustering Algorithms And Strategies
Farzad Nozarian
 
PDF
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
IOSR Journals
 
PDF
UNIT III NON LINEAR DATA STRUCTURES – TREES
Kathirvel Ayyaswamy
 
PDF
Analysis and implementation of modified k medoids
eSAT Publishing House
 
PDF
IRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET Journal
 
PDF
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET Journal
 
PDF
Big Data Clustering Model based on Fuzzy Gaussian
IJCSIS Research Publications
 
PDF
F04463437
IOSR-JEN
 
PDF
Finding Relationships between the Our-NIR Cluster Results
CSCJournals
 
PDF
Privacy preserving clustering on centralized data through scaling transf
IAEME Publication
 
PDF
50120140505013
IAEME Publication
 
PDF
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
IRJET Journal
 
Premeditated Initial Points for K-Means Clustering
IJCSIS Research Publications
 
A046010107
IJERA Editor
 
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
IJMER
 
Comparative Analysis of Algorithms for Single Source Shortest Path Problem
CSCJournals
 
Af4201214217
IJERA Editor
 
A frame work for clustering time evolving data
iaemedu
 
A Novel Approach for Clustering Big Data based on MapReduce
IJECEIAES
 
Big data Clustering Algorithms And Strategies
Farzad Nozarian
 
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
IOSR Journals
 
UNIT III NON LINEAR DATA STRUCTURES – TREES
Kathirvel Ayyaswamy
 
Analysis and implementation of modified k medoids
eSAT Publishing House
 
IRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET Journal
 
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET Journal
 
Big Data Clustering Model based on Fuzzy Gaussian
IJCSIS Research Publications
 
F04463437
IOSR-JEN
 
Finding Relationships between the Our-NIR Cluster Results
CSCJournals
 
Privacy preserving clustering on centralized data through scaling transf
IAEME Publication
 
50120140505013
IAEME Publication
 
MPSKM Algorithm to Cluster Uneven Dimensional Time Series Subspace Data
IRJET Journal
 
Ad

Viewers also liked (6)

PDF
Comparative analysis of various data stream mining procedures and various dim...
Alexander Decker
 
PDF
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE
IJORCS
 
PDF
Improved Performance of Unsupervised Method by Renovated K-Means
IJASCSE
 
PPTX
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajja
adriannaranjo3
 
PDF
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)
IJERD Editor
 
Comparative analysis of various data stream mining procedures and various dim...
Alexander Decker
 
QUALITY OF CLUSTER INDEX BASED ON STUDY OF DECISION TREE
IJORCS
 
Improved Performance of Unsupervised Method by Renovated K-Means
IJASCSE
 
Trabajo de musica 3 diver adrian naranjo hernandez ajajjajajajja
adriannaranjo3
 
WHAT DOES MEAN “GOD”?... (A New theory on “RELIGION”)
IJERD Editor
 
Ad

Similar to 50120130406008 (20)

PDF
Clustering Approach Recommendation System using Agglomerative Algorithm
IRJET Journal
 
PDF
Bs31267274
IJMER
 
PDF
Paper id 26201478
IJRAT
 
PDF
Ir3116271633
IJERA Editor
 
PDF
A Competent and Empirical Model of Distributed Clustering
IRJET Journal
 
PDF
A h k clustering algorithm for high dimensional data using ensemble learning
ijitcs
 
PDF
Cancer data partitioning with data structure and difficulty independent clust...
IRJET Journal
 
PDF
50120140505015 2
IAEME Publication
 
PDF
Du35687693
IJERA Editor
 
PDF
Similarity distance measures
thilagasna
 
PDF
A new link based approach for categorical data clustering
International Journal of Science and Research (IJSR)
 
PDF
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
csandit
 
PDF
Survey on traditional and evolutionary clustering
eSAT Publishing House
 
PDF
Survey on traditional and evolutionary clustering approaches
eSAT Journals
 
PDF
4 image segmentation through clustering
IAEME Publication
 
PDF
4 image segmentation through clustering
prjpublications
 
PDF
Enhanced Clustering Algorithm for Processing Online Data
IOSR Journals
 
PDF
Multilevel techniques for the clustering problem
csandit
 
PDF
An Analysis On Clustering Algorithms In Data Mining
Gina Rizzo
 
PDF
A comprehensive survey of contemporary
prjpublications
 
Clustering Approach Recommendation System using Agglomerative Algorithm
IRJET Journal
 
Bs31267274
IJMER
 
Paper id 26201478
IJRAT
 
Ir3116271633
IJERA Editor
 
A Competent and Empirical Model of Distributed Clustering
IRJET Journal
 
A h k clustering algorithm for high dimensional data using ensemble learning
ijitcs
 
Cancer data partitioning with data structure and difficulty independent clust...
IRJET Journal
 
50120140505015 2
IAEME Publication
 
Du35687693
IJERA Editor
 
Similarity distance measures
thilagasna
 
A new link based approach for categorical data clustering
International Journal of Science and Research (IJSR)
 
FAST ALGORITHMS FOR UNSUPERVISED LEARNING IN LARGE DATA SETS
csandit
 
Survey on traditional and evolutionary clustering
eSAT Publishing House
 
Survey on traditional and evolutionary clustering approaches
eSAT Journals
 
4 image segmentation through clustering
IAEME Publication
 
4 image segmentation through clustering
prjpublications
 
Enhanced Clustering Algorithm for Processing Online Data
IOSR Journals
 
Multilevel techniques for the clustering problem
csandit
 
An Analysis On Clustering Algorithms In Data Mining
Gina Rizzo
 
A comprehensive survey of contemporary
prjpublications
 

More from IAEME Publication (20)

PDF
IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME Publication
 
PDF
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
IAEME Publication
 
PDF
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
IAEME Publication
 
PDF
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
IAEME Publication
 
PDF
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
IAEME Publication
 
PDF
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
IAEME Publication
 
PDF
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
IAEME Publication
 
PDF
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IAEME Publication
 
PDF
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
IAEME Publication
 
PDF
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
IAEME Publication
 
PDF
GANDHI ON NON-VIOLENT POLICE
IAEME Publication
 
PDF
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
IAEME Publication
 
PDF
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
IAEME Publication
 
PDF
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
IAEME Publication
 
PDF
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
IAEME Publication
 
PDF
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
IAEME Publication
 
PDF
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
IAEME Publication
 
PDF
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
IAEME Publication
 
PDF
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
IAEME Publication
 
PDF
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
IAEME Publication
 
IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME Publication
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
IAEME Publication
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
IAEME Publication
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
IAEME Publication
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
IAEME Publication
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
IAEME Publication
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
IAEME Publication
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IAEME Publication
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
IAEME Publication
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
IAEME Publication
 
GANDHI ON NON-VIOLENT POLICE
IAEME Publication
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
IAEME Publication
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
IAEME Publication
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
IAEME Publication
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
IAEME Publication
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
IAEME Publication
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
IAEME Publication
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
IAEME Publication
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
IAEME Publication
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
IAEME Publication
 

Recently uploaded (20)

PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
Software Development Methodologies in 2025
KodekX
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 

50120130406008

  • 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 6, November - December (2013), pp. 78-82 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com IJCET ©IAEME DIVISIVE HIERARCHICAL CLUSTERING USING PARTITIONING METHODS Megha Gupta M.Tech Scholar, Computer Science & Engineering, Arya College of Engg & IT Jaipur, Rajasthan, India Vishal Shrivastava Professor, Computer Science & Engineering Arya College of Engg & IT Jaipur, Rajasthan, India ABSTRACT Clustering is the process of partitioning a set of data so that the data can be divided into subsets. Clustering is implemented so that same set of data can be collected on one side and other set of data can be collected on the other end. Clustering can be done using many methods like partitioning methods, hierarchical methods, density based method. Hierarchical method creates a hierarchical decomposition of the given set of data objects. In successive iteration, a cluster is split into smaller clusters, until eventually each object is in one cluster, or a termination condition holds. In this paper, partitioning method has been used with hierarchical method to form better and improved clusters. We have used various algorithms for getting better and improved clusters. Keywords: Clustering, Hierarchical, Partitioning methods. I. INTRODUCTION Data Mining, also popularly known as Knowledge Discovery in Databases (KDD), refers to the nontrivial extraction of implicit, previously unknown and potentially useful information from data in databases. While data mining and knowledge discovery in databases (or KDD) are frequently treated as synonyms, data mining is actually part of the knowledge discovery process. 78
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME The Knowledge Discovery in Databases process comprises of a few steps leading from raw data collections to [1] some form of new knowledge. The iterative process consists of the following steps: Data cleaning: also known as data cleansing, it is a phase in which noise data and irrelevant data are removed from the collection. • Data integration: at this stage, multiple data sources, often heterogeneous, may be combined in a common source. • Data selection: at this step, the data relevant to the analysis is decided on and retrieved from the data collection. • Data transformation: also known as data consolidation, it is a phase in which the selected data is transformed into forms appropriate for the mining procedure. • Data mining: it is the crucial step in which clever techniques are applied to extract patterns potentially useful. • Pattern evaluation: in this step, strictly interesting patterns representing knowledge are identified based on given measures. • Knowledge representation: is the final phase in which the discovered knowledge is visually represented to the user. This essential step uses visualization techniques to help users understand and interpret the data mining results. Clustering is the organization of data in classes. However, unlike classification, in clustering, class labels are unknown and it is up to the clustering algorithm to discover acceptable classes. Clustering is also called unsupervised classification, because the classification is not dictated by given class labels. There are many clustering approaches all based on the principle of maximizing the similarity between objects in a same class (intra-class similarity) and minimizing the similarity between objects of different classes (inter-class similarity) [2]. • II. RELATED WORK Hierarchical Clustering for Data-mining Hierarchical methods for supervised and unsupervised data mining give multilevel description of data. It is relevant for many applications related to information extraction, retrieval navigation and organizations. The two interpretation techniques have been used for description of the clusters: 1. Listing of prototypical data examples from the cluster. 2. Listing of typical features associated with the cluster. The Generalizable Gaussian Mixture model (GGM) and the soft Generalizable Gaussian mixture model (SGGM) are addressed for supervised and unsupervised learning. These two models help in calculating parameters of the Gaussian clusters with a modified EM procedure from two disjoint sets of observation that helps in ensuring high generalization ability [3]. Procedure The agglomerative clustering scheme is started by k clusters at level j=1, as given by the optimized GGM model of p(x). At each higher level in the hierarchy, two clusters are merged based on a similarity measure between pairs of clusters. The procedure is repeated till the top level. That is, at level j=1, there are k clusters and 1 cluster at the final level, j=2k-1. The natural distance measure between the cluster densities is the Kullback- Leibler (KL) divergence, since it reflects dissimilarity between the densities in the probabilistic space. 79
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME Limitations The drawback is KL only obtains an analytical expression for first level in the hierarchy while distances for the subsequently levels have to be approximated. Automatically Labeling Hierarchical Clustering A simple algorithm has been used that automatically assigns labels to hierarchical clusters. The algorithm evaluates candidates labels using information from cluster, parent cluster and corpus statistics. A trainable threshold enables the algorithm to assign just a few high- quality labels to each cluster. First, it is assumed that algorithm has access to a general collection of documents E, representing the word distribution in general English. This English corpus is used in selecting label candidates [4]. Procedure Given a cluster S and its parent cluster P, which includes all of documents in S and in sibling clusters of S, the algorithm selects labels for the cluster with following steps: 1. Collect phrase statistics: For every unigram, bigram and tri-gram phrase p occurring in the cluster S, calculate the document frequency and term frequency statistics for the cluster, the parent cluster and the general English corpus. 2. Select label candidates: Select the label candidates from unigram, bigram, tri-gram phrases based on document frequency in the cluster and in general English language. 3. Calculate the descriptive score: Calculate the descriptive score for each label candidate, and then sort the label candidates by these scores. 4. Calculate the cutoff point: Decide how many label candidate to display based on the descriptive scores. Limitation The most errors come from clusters containing small numbers of documents. The small number of observations in small clusters can be good and bad labels indistinguishable; minor variations in vocabulary can also produce statistical feature with high variance. Fast Hierarchical Clustering and other applications of dynamic closet pairs The data structures for dynamic closet pair problems with arbitrary distance functions, based on a technique used for Euclidean closet pairs. This paper show how to insert or delete object from n-object set, maintaining the closet pair, O (n log2n) time per update and O (n) space. The purpose of this paper is to show that much better bounds are possible, using data structures that are simple. If linear space is required, this represents an order –of-magnitude improved. Procedure The data structure consists of a partition of the dynamic set S into k ≤ log n subsets S1, S2….Sk, together with a digraph Gi for each set Si. Initially all points will be in S1 and G1 will have n1 edges. Gi may contain edges with neither endpoint in Si; if the number of edges in all graphs grows to 2n then the data structure will be rebuild by moving all points to S1 and recomputing G1. The closet pair will be represented by an edge in some Gi, so the pair can be found by scanning the edges in all graphs [5]. Create Gi for a new partition Si. Initially, Gi will consist of a single path. Choose the first vertex of the path to be any object in Si. Then, extend the path one edge at a time. When last vertex in the path P is in Si, choose the next vertex to be its nearest neighbor in S P, and when the last vertex is in S Si, choose the next vertex to be its nearest neighbor in Si P. Continue until the path can no longer be extended because S P or Si P is empty. 80
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME Merge Partitions: The update operations can cause k to be too large relative to n. If so, choose subsets Si and Sj as close to equal in size as |Si| ≤ |Sj| and |Sj| / |Si| minimized. Merge these two subsets into one and create graph Gi for the merged subset. To insert x, create a new subset Sk+1= {x} in the partition of S, create Gk+1, and merge partitions as necessary until k ≤ log n. To delete x, create a new subset Sk+1 consisting of all objects y such that (y, x) is a directed edge in some Gi. Remove x and all its adjacent edges from all the graphs Gi. Create the graph Gk+1 for Sk+1, and merge partitions as necessary until k ≤ log n. Theorem: The data structure above maintains the closet pair in S in O (n) space, amortized time O (n log n) per insertion, and amortized time O (n log2 n) per deletion. Limitations The methods that are tested involve sequential scans through memory, a behavior known to reduce the effectiveness of cached memory. Motivations Using hierarchical clustering, better clusters will be formed. The clusters formed will appear in better way and there will be tight bonding in between them. It means that the clusters formed will be refined using the various algorithm of hierarchical clustering. III. PROBLEM STATEMENT The objective of the proposed work is to perform hierarchical clustering to obtain the more refined clusters with strong relationship between members of same cluster. IV. PROPOSED APPROACH In this paper, we have used K-means algorithm and CFNG to find better and improved clusters. K-means Algorithm Suppose a data set, D, contains n objects in Euclidean space. Partitioning methods distribute the objects into k clusters, Ci…..Ck, that is, Ci ⊂ D and Ci ∩ Cj=Ø for (1≤ i, j≤k). An objective function is used to access the partitioning quality so that objects within a cluster are similar to one another but dissimilar to objects in other clusters. This is, the objective function aims for high intracluster similarity and low intercluster similarity [6]. A centroid-based partitioning technique uses the centroid of a cluster, Ci, to represent that cluster. The centroid of a cluster is its center point. The centroid can be defined in various ways such as by the mean or medoids of the objects assigned to the cluster. The difference between an object p and Ci, the representation of the cluster, is measured by dist(p,Ci), where dist(x,y) is the Euclidean distance between two points x and y. CFNG Colored farthest neighbor graph shares many characteristics with SFN (shared farthest neighbors) by Rovetta and Masulli [7]. This algorithm yields binary partitions of objects into subsets, whereas number of subsets obtained by SFN can vary. The SFN algorithm can easily split a cluster where no natural partition is necessary, while the CFNG often avoids such splits. 81
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 0976-6367(Print), ISSN 0976 - 6375(Online), Volume 4, Issue 6, November - December (2013), © IAEME 6 V. RESULTS To observe the effect of hierarchical clustering, k-means and CFNG algorithm are used and k means to observe the results, the experimental setup was designed using Java, MySQL. The obtained results are compared with K-means and CFNG, when executed individually. means Figure 1: Comparison of Proposed Algorithm with K-means and CFNG K means VI. CONCLUSION AND FUTURE SCOPE We have obtained better and improved clusters us using K-means and CFNG algorithms means hierarchically. The final clusters obtained are tightly bonded with each other. In this paper, we have used 2 different algorithms for hierarchical clustering. Instead of using clustering CFNG, we could have used other hierarchical clustering algorithm. REFERENCES [1] [2] [3] [4] [5] [6] Osmar R. Zaïane,” Principles of Knowledge discovery in databases”, 1999. Jiawei Han, Micheline Kamber, Jian Pei, “Data Mining Concepts and Techniques”. Osmar R. Zaïane,” Principles of Knowledge discovery in databases”, 1999. Pucktada Treeratpituk, Jamie Callon,” Automatically Labeling Hierarchical Clusters”. David Eppstein, “Fast Hierarchical Clustering and other applications of dynamic closet pairs”. G.Plaxton, Approximation Algorithms for Hierarchical Location problems. Proceedings of the 35th ACM Symposium on the Theory of Computation, 2003. [7] A.Borodin, R.Ostrovsky & Y. Rabani. Subquadratic approximation algorithm for clustering odin, problem in high dimensional spaces. Proceedings of 31st ACM Symposium on Theory of Computation, 1999. [8] Rinal H. Doshi, Dr. Harshad B. Bhadka and Richa Mehta, “Development of Pattern Development Knowledge Discovery Framework using Clustering Data Mining Algorithm”, International ta Algorithm Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3, 2013, , pp. 101 - 112, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. , [9] Deepika Khurana and Dr. M.P.S Bhatia, “Dynamic Approach to K-Means Clustering nd Bhatia Means Algorithm”, International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3, 2013, pp. 204 - 219, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375. [10] Meghana. N.Ingole, M.S.Bewoor, S.H.Patil,, “Context Sensitive Text Summarization using Context Hierarchical Clustering Algorithm”, International Journal of Computer Engineering & Algorithm Technology (IJCET), Volume 3, Issue 1, 2012, pp. 322 - 329, ISSN Print: 0976 – 6367, ISSN , Online: 0976 – 6375. 82