50120140501016

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING &
6367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 1, January (2014), © IAEME

TECHNOLOGY (IJCET)

ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 5, Issue 1, January (2014), pp. 141-152
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com

IJCET
©IAEME

COMPARATIVE STUDY OF REMOTE SENSING DATA BASED ON
GENETIC ALGORITHM
Sugandhi Midha
M.Sc (CS) M.Phil (CS) M.Tech (CS)
Lecturer, CMR Institute of Management Studies (CMRIMS), Bangalore, Karnataka, India

ABSTRACT
In data mining fields, clustering is an important issue. Clustering is a technique to groups
objects based on the information found in the data describing the objects. Image segmentation is the
process of partitioning a digital image into multiple segments. Some of the image segmentation
techniques are edge detection, region growing and clustering etc. But it has the drawbacks like, both
region boundary and edge detection based methods often fail to produce accurate segmentation
results. To overcome this DBCSAN clustering algorithm was implemented. The problem in
DBSCAN algorithm is they require values for input parameters which are hard to determine, to
overcome this OPTICS algorithm was implemented. In this method the objects should be processed
in a specific order. Rather than having the precise cut off between the categories fuzzy logic uses
truth values between 0.0 and 1.0 represents the degree of membership. The Fuzzy rules are used on
these algorithms to improve the performance of the clusters.
The computational complexity of the above clustering algorithms is very high. To solve these
problems, Grid based algorithms are used. Hybrid approaches i.e. combining of grid with DBSCAN
and OPTICS can be used to reduce the computational complexity and the calculation amount of data
storage. The accuracy and time taken for DBSCAN, OPTICS, GRID, GRID-DBSCAN and GRIDOPTICS were evaluated.
Keywords: Cluster, Genetic, DBSCAN, OPTICS, GRID DBSCAN, GRID OPTICS.
1. Introduction
There are three interconnected reasons why the effectively of clustering algorithms is a
problem. First, almost all clustering algorithms require values for input parameters which are hard to
determine, which parameter gives good result. Especially for real-world data sets containing high
dimensional objects. Second, the algorithms are very sensible to these parameter values, often
141

International Journal of Computer Engineering and Technology (IJCET), ISSN 09766367(Print), ISSN 0976 - 6375(Online), Volume 5, Issue 1, January (2014), © IAEME
producing very different partitioning of the data set even for slightly different parameter settings.
Third, high-dimensional real-data sets often have a much skewed Distribution, means it gives
unequal distribution that cannot be revealed by a clustering algorithm using only one global
parameter setting [1]. Some of the spatial clustering algorithms are partitioning based, hierarchical
based, density based and grid based. The first three clustering methods have some drawbacks e.g.
elements in the same cluster might not share enough similarity or the performance may be
prohibitively poor. These algorithms have a greater performance for any shape of data collection, but
the computational complexity is very high.
To enhance the efficiency of clustering, a grid based clustering approach [2] is used which
uses a grid data structure. It divides the data into a finite number of cells which form a grid structure
on which all of the operations for clustering are performed. The main advantage of the approach is its
fast processing time which is typically independent of the number of data objects, yet dependent on
only the number of cells. In this project DBSCAN, OPTICS, Grid clustering (STING) algorithms
are implemented on the input image. By combining these three clustering algorithms on the same
input image a hybrid approach is used to implement a new clustering algorithm GRID_DBSCAN,
GRID_OPTICS [3]. By comparing the accuracy and time taken to form clusters by the three
algorithms, conclusion can be made which is the best among them.

Figure Block Diagram of Flow of Study

142

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976
2. DENSITY BASED SPATIAL CLUSTERING APPLICATION ALONG WITH NOISE
Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering
based
algorithm proposed by Martin Ester, Hans Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996. It is a
Hans-Peter
density-based clustering algorithm because it finds a number of clusters starting from the estimated
density distribution of corresponding nodes. It requires only one input parameter and supports the
user in determining an appropriate value for it. It discovers clusters of arbitrary shape. Finally,
arbitrary
DBSCAN is efficient even for large spatial databases. The present algorithm DBSCAN (Density
Based Spatial Clustering of Applications with Noise) which is designed to discover the clusters and
the noise in a spatial database relies on a density-based notion of cluster: A cluster is defined as a
based
maximal set of density-connected points.
connected
DBSCAN Algorithm
The DBSCAN algorithm can identify clusters in large spatial data sets by looking at the local
density of database elements, using only one input parameter. Furthermore, the user gets a
suggestion on which parameter value that would be suitable. Therefore, minimal knowledge of the
domain is required. The DBSCAN can also determine what information should be classified as noise
or outliers. In spite of this, it’s working process is quick and scales very well with the size of the
pite
database – almost linearly. By using the density distribution of nodes in the database, DBSCAN can
categorize these nodes into separate clusters that define the different classes. DBSCAN can find
clusters of arbitrary shape. However, clusters that lie close to each other tend to belong to the same
class. If a pixel is found to be a dense part of a cluster, its -neighborhood is also part of that cluster.
neighborhood
Hence, all pixels that are found within the -neighborhood are added, as is their own neighborhood
neighborhood when they are also dense. This process continues until the density-connected cluster is
density connected
completely found. Then, a new unvisited pixel is retrieved and processed, leading to the discovery of
a further cluster or noise.
The pixel information’s of the image are extracted into a list l. The algorithm DBSCAN takes
input parameters as epsilon Eps and minimum points MinPts. C is the no of clusters. The function
getNeighborPixels( ) takes input parameter as pixel p and epsilon Eps and returns a list of pixels of
s(
the list l which are epsilon neighborhood of pixel P. The function getDistance takes input parameters
as two pixels and returns the color distance between those two. The function getDist takes input
parameters as two pixels and returns distance between those two. getX( ) and getY( ) returns the x
x-y
coordinate of the pixel within the image.
Algorithm DBSCAN (l, Eps, MinPts) {
for each unvisited pixel P in list l{
mark P as visited
NeighborPixels = getNeighborPixels(P, eps)
if (sizeof(NeighborPixels) < MinPts){
mark P as NOISE
}else{
C = next cluster
expandCluster(P, NeighborPixels, C, eps, MinPts)
}
}
}
expandCluster( P, NeighborPixels, C, eps, MinPts ){
add P to cluster C
for each pixel P' in NeighborPixels{
143

International Journal of Computer Engineering and Technology (IJCET), ISSN 0976
if ( P' is not visited ){
mark P' as visited
NeighborPixels’ = getNeighborPixels(P', eps)
If( sizeof(NeighborPixels') >= MinPts ){
NeighborPixels = NeighborPixels joined with NeighborPixels '}
Neighb
}
}
If ( P' is not yet member of any cluster)
add P' to cluster C
}
3. ORDERING POINTS TO IDENTIFY THE CLUSTERING STRUCTURE
Ordering points to identify the clustering structure (OPTICS) is an algorithm for finding
density-based clusters in spatial data. It was presented by Mihael Ankerst, Markus M. Breunig,
Hans-Peter Kriegel and Jörg Sander Its basic idea is similar to DBSCAN, but it addresses one of
Peter
DBSCAN's major weaknesses: the problem of detecting meaningful clusters in data of varying
density. In order to do so, the points of the database are (linearly) ordered such that points which are
spatially closest become neighbors in the ordering. Additionally, a special distance is stored for each
point that represents the density that needs to be accepted for a cluster in order to have both points
hat
belong to the same cluster. An important property of many real-data sets is that their intrinsic cluster
real data
structure cannot be characterized by global density parameters. Very different local densities may be
local
needed to reveal clusters in different regions of the data space, it is not possible to detect the clusters
A, B, C1, C2, and C3 simultaneously using one global density parameter. A global density
density-based
decomposition would consist only of the clusters A, B, and C, or C1, C2, and C3.
f
In the second case, the objects from A and B are noise.
noise
OPTICS Algorithm
OPTICS requires two parameters: (Eps) and the minimum number of pixels required to
form a cluster (MinPts). It starts with an arbitrary starting pixel that has not been visited and it stores
the core-distance and a suitable reachability-distance for each pixel. We will see that this information
distance
reachability distance
is sufficient to extract all density-based clustering with respect to any distance ’ which is s
based
smaller
than the generating distance from this order. This pixel's -neighborhood is retrieved, and if it
neighborhood
contains sufficiently many pixels within or less than -neighborhood, a cluster is started. Otherwise,
neighborhood,
the pixel is labeled as noise. Note that this point might later be found in a sufficiently sized environment of a different pixel and hence be made part of a cluster.
We first check whether the reachability-distance of the current point is larger than the
clustering-distance ’. In this case, t pixel is not density-reachable with respect to ’ and MinPts
the
reachable
from any of the pixels which are located before the current pixel in the clustercluster-ordering. This is
obvious, because if pixel had been density
density-reachable with respect to ’ and MinPts from a preceding
prece
object in the order, it would have been assigned a reachability-distance of at most ’. Therefore, if
the reachability-distance is larger than ’, we look at the core-distance of pixel and start a new
distance
distance
cluster if pixel is a core object with respect to ’ and MinPts; otherwise, pixel is assigned to NOISE
(note that the reachability-distance of the first pixel in the cluster ordering is always UNDEFINED
distance
cluster-ordering
and that we assume UNDEFINED to be greater than any defined distance). If the reachability
reachabilitydistance of the current pixel is smaller than ’, we can simply assign this pixel to the current cluster
he
because then it is density-reachable with respect to ’ and MinPts from a preceding core object in the
reachable
cluster-ordering.

144

The pixel information’s of the image are extracted into a list l. The algorithm OPTICS takes
input parameters as epsilon Eps and minimum points MinPts. C is the no of clusters. The function
getNeighborPixels( ) takes input parameter as pixel p and epsilon Eps and returns a list of pixels of
the list l which are epsilon neighborhood of pixel P. The function getDistance takes input parameters
as two pixels and returns the color distance between those two. The function getDist takes input
parameters as two pixels and returns distance between those two. getX( ) and getY( ) returns the x-y
coordinate of the pixel within the image.
Algorithm OPTICS (l, Eps, MinPts) {
for each unvisited pixel P in list l{
mark P as visited
NeighborPixels = getNeighborPixels(P, eps)
if (sizeof(NeighborPixels) < MinPts){
mark P as NOISE}else{
C = next cluster
expandCluster(P, NeighborPixels, C, eps, MinPts)
}
}
}
expandCluster( P, NeighborPixels, C, eps, MinPts ){
add P to cluster C
for each pixel P' in NeighborPixels{
if ( P' is not visited ){
mark P' as visited
if(p.getCoreDist()!=0
if(p’.getReachDist()!=0)
cluster1.add(_p);
If( sizeof(cluster1) >= MinPts ){
NeighborPixels = NeighborPixels joined with NeighborPixels '
}
}
}
If ( P' is not yet member of any cluster)
add P' to cluster C
}
4. GRID-DBSCAN
To improve the accuracy of the clustering technique, in this project the hybrid approach is
used to combine the DBSCAN and GRID clustering algorithm to create a new clustering algorithm
GRID-DBSCAN. The spatial image is divided into rectangular cells. We have several different
levels of such rectangular cells corresponding to different resolution and these cells form a
hierarchical structure. Each cell at a high level is partitioned to form a number of cells of the next
lower level. After construction of the GRID structure clustering on the GRID is done in the
following way. To perform clustering on such a grid data structure, users must first supply the
density level as input parameter. Using this parameter, a top-down, grid-based method is used to find
regions with sufficient density by adopting the following procedure. First a layer within the
hierarchical structure is determined. In this project the root layer is selected. For each cell in the

145

current layer, compute the confidence interval that the cell will relevant to the clustering. A Cell
which does not meet the confident level is deemed irrelevant.
After finishing the examination of the current layer, the next lower level of the cells is
examined and repeated the same process. The only difference is that instead of going through all
cells, only those cells that are children of the relevant cells of the previous layer are processed. This
procedure continues until it reaches the lowest level layer (bottom layer). At this time regions of
relevant cells are processed. In GRID-DBSCAN algorithm the DBSCAN algorithm is used for
processing of the relevant bottom cells. In this project all the 6 regions (Agriculture, water, greenery,
urban, sea, other) are displayed on the clustering output. So the users no need to give any input
parameter. The thresholds are calculated from the pixel information.
Algorithm For Grid-DBSCAN
The input to this algorithm is the root layer of the Grid structure. G.getG1( ), G.getG2( ), G.getG3( ),
G.getG4( ) returns the child grids of the current grid cell G.
Algorithm GRID-DBSCAN ( grid G ){
if( G.getG1() == null ){
Apply DBSCAN algorithm on current bottom layer
}else if( G.getG1() != null ){
If (G.getG1( ) satisfy threshold ){
G.getG1().GRID_CLUTERING ( );
}
If(G.getG2( ) satisfy threshold ){
G.getG2().GRID_CLUTERING ( );}
If( G.getG3( ) satisfy threshold ){
G.getG3().GRID_CLUTERING ( );}
}
}
5. GRID-OPTICS
To improve the accuracy of the clustering technique, in this project the hybrid approach is
used to combine the OPTICS and GRID clustering algorithm to create a new clustering algorithm
GRID-OPTICS [3]. The spatial image is divided into rectangular cells. We have several different
levels of such rectangular cells corresponding to different resolution and these cells form a
hierarchical structure. Each cell at a high level is partitioned to form a number of cells of the next
lower level. After construction of the GRID structure clustering on the GRID is done in the
following way: - To perform clustering on such a grid data structure, users must first supply the
density level as input parameter. Using this parameter, a top-down, grid-based method is used to find
regions with sufficient density by adopting the following procedure. First a layer within the
hierarchical structure is determined. In this project the root layer is selected. For each cell in the
current layer, compute the confidence interval that the cell will relevant to the clustering. Cells which
do not meet the confident level are deemed irrelevant. After finishing the examination of the current
layer, the next lower levels of the cells are examined and repeated the same process. The only
difference is that instead of going through all cells, only those cells that are children of the relevant
cells of the previous layer are processed. This procedure continues until it reaches the lowest level
layer (bottom layer). At this time regions of relevant cells are processed. In GRID-OPTICS
algorithm the OPTICS algorithm is used for processing of the relevant bottom cells. In this project
146

all the 6 regions (Agriculture, water, greenery, urban, sea, other) are displayed on the clustering
output. So the user no needs to give any input parameter. The thresholds are calculated from the
pixel information.
Algorithm for Grid-OPTICS
GRID-OPTICS ( grid G ){
if( G.getG1() == null ){
Apply OPTICS algorithm on current bottom layer
}else if( G.getG1() != null ){
If (G.getG1( ) satisfy threshold ){
}
If(G.getG2( ) satisfy threshold ){
}
}
}
}
6. PERFORMANCE EVALUATION
6.1 Kappa Statistics
Cohen's kappa coefficient [16] is a statistical measure of inter-rater agreement or interannotator agreement for qualitative (categorical) items. It is generally thought to be a more robust
measure than simple percent agreement calculation since κ takes into account the agreement
occurring by chance.
6.2 Calculation
Cohen's kappa measures the agreement between two raters who each classify N items into C
mutually exclusive categories. The first mention of a kappa-like statistic is attributed to Galton
The equation for κ is:

k=

Pr( a ) − Pr(e)
,
1 − Pr(e)

Where Pr(a) is the relative observed agreement among raters, and Pr(e) is the hypothetical
probability of chance agreement, using the observed data to calculate the probabilities of each
observer randomly saying each category. If the raters are in complete agreement then κ = 1. If there
is no agreement among the raters other than what would be expected by chance (as defined by Pr(e)),
κ = 0.
6.3. Interpreting Kappa
Here is one possible interpretation of Kappa:
Poor agreement = Less than 0.20
Fair agreement = 0.20 to 0.40
147

Moderate agreement = 0.40 to 0.60
Good agreement = 0.60 to 0.80
Very good agreement = 0.80 to 1.0
7. EXPERIMENT RESULTS AND ANALYSIS
Original Image

DBSCAN

GRID DBSCAN GRID OPTICS

OPTICS

OPTICS

DBSCAN
Kappa
Coefficient

Time
02:02:39:660

0.6299

Kappa
Coefficient
0.7116

16 * 16

0.8991

02:02:28:060

Grid OPTICS

Grid DB SCAN
8*8

Time

32 * 32

0.901

8*8

0.9026

16 * 16

0.9083

0.9083

32 * 32
0.9083

Accuracy analysis of discretization in WEKA by using 5 bins
DBSCAN
0.5234

OPTICS

GRID

0.6374

GRIDDBSCAN GRIDOPTICS

0.674

0.8378

0.8436

Best fitness values in genetic algorithm
DBSCAN with
GA

OPTICS with
GA

0.045647

GRID-DBSCAN
with GA

0.029049

0.49758

148

GRID-OPTICS
with GA
0.99575

Comparison of Clustering Output

Figure: Clustering Output for DBSCAN and OPTICS (1 – 5)

149


Figure: Clustering Output for GRID, GRID DBSCAN and GRID OPTICS (1 – 5)

150

8. CONCLUSION
In the density based clustering, the OPTICS clustering algorithm gives good result compared
to DBSCAN clustering algorithm for the same input parameters and it was done for different input
parameters. OPTICS is a generalization of DBSCAN to multiple ranges, effectively replacing the ɛ
parameter with a maximum search radius that mostly affects the performance. The grid structure for
the input image was implemented to perform the clustering. The statistical information for each gird
of the image was calculated. It has much less computational cost than other approaches. The I/O cost
is low since only the grid data structure is stored in memory. Both of these will speed up the
processing of clustering. In addition, it offers an opportunity for parallelism. All these advantages
benefit from the hierarchical structure of grid cells and the statistical information associated with
them.
There are different types of grid sizes, which includes 8*8 and 16*16, and 32*32. In that
32*32 grid size gives good result compared to other grid sizes. The DBSCAN and OPTICS
Clustering Algorithms is compared with the hybrid Algorithms GRID-DBSCAN and GRID-OPTICS
for the same input image. It also implies that accuracy has been improved for hybrid algorithms.
Finally GRID-OPTICS given high performance compared to previous clustering algorithms. From
the table 10.7 image 5 and image 7 gives good performance in GRID-OPTICS with GA compared to
DBSCAN with GA, OPTICS with GA, GRID with GA, GRID-DBSCAN with GA. Because for
image5 and image 7 there is a more distance between the pixels that’s why when we combined with
the genetic algorithm the performance was improved. For the remaining images due to combining of
DBSCAN and OPTICS with GRID clustering performance was improved here only.
Fuzzy logic proved to be an excellent choice for Image classification applications since it
mimics human control logic. It uses an imprecise but very descriptive language to deal with input
data more like a human operator. Fuzzy rules are very helpful in clustering techniques for better
classification. That’s why these fuzzy rules are used for classification of class information for the
input images in different clustering techniques.
In this project an unsupervised discretization is used in WEKA for accuracy calculation, it
splits the whole range of numbers in intervals with equal size. The number of bins was taken as 5,
15, 25.In that discretization using 25 bins given good result compared to 5 and 15 bins. And also it
was done for 25, 20, 15 bins but it doesn’t get a more difference in results. So finally discretization
using 25 bins given good results for all clustering algorithms.
This project is being implemented on a single image. The future enhancements can be
implemented on image datasets. In GRID clustering algorithm all the bottom grid cells size are
unique; this can be extended to variable cell size and it forms a hierarchical structure. The density
based methods DBSCAN, OPTICS methods will be extended to other density based methods like
DENCLUE (Clustering Based on Density Distribution Functions. Along with CPU time taken and
accuracy, other parameters can be taken for comparison of the algorithms.
REFERENCES
[1]

[2]

Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jörg Sander.”OPTICS: Ordering
Points to Identify the Clustering Structure”. Proc. ACM SIGMOD’99 Int. Conf. on
Management of Data, Philadelphia PA, 1999
Changzhen Hu, Jiadong Ren, and Lili Meng, “CABGD:An Improved Clustering Algorithm
Based on Grid-Density”. Fourth International Conference on Innovative Computing,
Information and Control, Pages 381-384, 2009.

151

[3]

[4]

[5]

[6]

[7]

[8]
[9]
[10]

[11]

[12]
[13]

[14]
[15]
[16]
[17]

[18]

[19]
[20]

Bian Fuling and Ming, “A Grid and Density Based Fast Spatial Clustering Algorithm”,
International Conference on Artificial Intelligence and Computational Intelligence, Pages
260-263, November 2009.
Hans-Peter Kriegel, Jörg Sander, Martin Ester, and Xiaowei Xu. A Density-Based Algorithm
for Discovering Clusters in Large Spatial Databases with Noise. Published in Proceedings of
2nd International Conference on Knowledge Discovery and Data Mining (KDD-96).
Y. Tarabalka, J. A. Benediktsson, J. Chanussot, “Spectral-Spatial Classification of
Hyperspectral Imagery Based on Partitional Clustering Techniques,” IEEE Trans. Geosci.
Remote Sensing, vol. 47, no. 8, pp. 2973-2987, 2009.
D. Stavrakoudis, G. Galidaki, I. Gitas, J. Theocharis, “A Genetic Fuzzy- Rule-Based
Classifier for Land Cover Classification from Hyperspectral Imagery,” IEEE Trans. Geosci.
Remote Sensing, vol. 50, no. 1, pp. 130– 148, 2012.
S. Derivaux, G. Forestier, C. Wemmert, S. Lefevre, “Supervised image segmentation using
watershed transform, fuzzy classification and evolutionary computation,” Pattern Recognit.
Lett. 31, pp. 2364-2374, 2010.
R. Q. Feitosa, G. A. Costa, T. B. Cazes, B. Feijo, “A genetic approach for the automatic
adaptation of segmentation parameters,” Int. Conf. Object-based image analysis, 2006.
S. Bhandarkar, H. Zhang, “Image Segmentation Using Evolutionary Computation,” IEEE
Trans. Evol. Comput, vol. 3, no. 1, pp. 1-21, 1999.
K. Melkemi, M. Batouche, S. Foufou, “A multiagent system approach for image
segmentation using genetic algorithms and extremal optimization heuristics,” Pattern
Recognit. Lett. 27, pp. 1230-1238, 2006.
S.Chen, D. Zhang, “Robust Image Segmentation Using FCM With Spatial Constraints Based
on New Kernel-Induced Distance Measure,” IEEE Trans. Sys. Man. And Cyber, vol. 34, no.
4, pp. 1907-1916, 2004.
G. Heo, P. Gader, “An Extension of Global Fuzzy C-means Using Kernel Methods,” IEEE
World Cong. Comp. Intel., 18-23 July 2010.
R. Inokuchi, T. Nakamura, S. Miyamoto, “Kernelized Cluster Validity Measures and
Application to Evaluation of Different Clustering Algorithms,” IEEE Int. Conf. Fuzzy
Systems, 16-21 July 2006.
Java Complete Reference.
Fundamentals of Digital Image Processing by A.K.Jain.
Data Mining –Concepts and Techniques- by JIA WEIHAN & MICHELINE KAMBER
Harcourt India.
R. Edbert Rajan and Dr.K.Prasadh, “Spatial and Hierarchical Feature Extraction Based on
Sift for Medical Images”, International Journal of Computer Engineering & Technology
(IJCET), Volume 3, Issue 2, 2012, pp. 308 - 322, ISSN Print: 0976 – 6367, ISSN Online:
0976 – 6375.
Gunwanti S. Mahajan and Kanchan S. Bhagat, “Medical Image Segmentation using
Enhanced K-Means and Kernelized Fuzzy C- Means”, International Journal of Electronics
and Communication Engineering & Technology (IJECET), Volume 4, Issue 6, 2013,
pp. 62 - 70, ISSN Print: 0976- 6464, ISSN Online: 0976 –6472.
http://www.cs.utexas.edu/users/ml/tutorials/WEKA-tut/.
http://en.wikipedia.org/wiki/Erdas_Imagine.

152

50120140501016

More Related Content

What's hot (16)

Viewers also liked (10)

Similar to 50120140501016 (20)

More from IAEME Publication (20)

Recently uploaded (20)

50120140501016