Genetic Algorithm for optimization on IRIS Dataset REPORT pdf

SAVITRIBAI PHULE PUNE UNIVERSITY
A MINI PROJECT REPORT ON
OPTIMIZATION OF GENETIC ALGORITHM USING IRIS
FLOWER DATASET
SUBMITTED TOWARDS THE
PARTIAL FULFILLMENT OF THE REQUIREMENTS OF
BACHELOR OF ENGINEERING (Computer Engineering)
BY
Sunil Rajput Exam No: 71720728F
Ashish kumar Singh Exam No: 71324943K
Ashish Yadav Exam No: 71741665J
Mayank Patil Exam No: 71550097L
Under The Guidance of
Prof. Mangesh Ghonge
DEPARTMENT OF COMPUTER ENGINEERING
SANDIP INSTITUTE OF TECHNOLOGY AND RESEARCH
CENTRE
MAHIRAVANI, TRIMBAK ROAD, NASHIK 422213

SANDIP INSTITUTE OF TECHNOLOGY AND RESEARCH CENTRE
DEPARTMENT OF COMPUTER ENGINEERING
CERTIFICATE
This is to certify that the Project Entitled
OPTIMIZATION OF GENETIC ALGORITHM USING IRIS
FLOWER DATASET
Submitted by
Sunil Rajput Exam No: 71720728F
Ashish Kumar Singh Exam No: 71324943K
Ashish Yadav Exam No: 71741665J
Mayank Patil Exam No: 71550097L
is a bonafide work carried out by Students under the supervision of Prof. Mangesh
Ghonge and it is submitted towards the partial fulfillment of the requirement of
Bache- lor of Engineering (Computer Engineering) Project.
Prof. Mangesh Ghonge Prof. A. D.Potgantwar
Internal Guide H.O.D
Dept. of Computer Engg. Dept. of Computer Engg.

SITRC, Department of Computer Engineering 2019-20 I
Abstract
Machine learning is the core of Artificial Intelligence (AI) and pattern recognition is
also an important branch of AI. In this thesis, the conception of machine learning
and machine learning algorithms are introduced. Moreover, a typical and simple
machine learning algorithm called K-means is introduced. A case study about Iris
classification is introduced to show how the K-means works in pattern recognition.
The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced
by the British statistician and biologist Ronald Fisher in his 1936 paper. The use of
multiple measurements in taxonomic problems as an example of linear discriminant
ana
collected the data to quantify the morphologic variation of Iris Flower of three
related species. Two of the three species were collected in Gaspe Peninsula all from
the same pasture, and picked on the same day and measured at the same time by the
same person with same apparatus.The data set consists of 50 samples from each of
three species of Iris that is 1) Iris Setosa 2) Iris Virginica 3) Iris Versicolor. Four
features were measured from each sample. They are 1) Sepal Length 2) Sepal Width
3) Petal Length4) Petal Width. All these four parameters are measured in
Centimeters. Based on the combination of these four features, the species among
three can be predicted.The aim of the case study is to design and implement a system
of pattern recognition for the Iris flower based on Machine Learning. This project
shows the workflow of pattern recognition and how to use machine learning
approach to achieve this goal. The data set was collected from an open source
website of machine learning. The programming language used in this project was
Python.
Keywords : Genetic Algorithm Optimization, Iris Dataset, Machine Learning,
Python.

SITRC, Department of Computer Engineering 2019-20 II
Acknowledgments
It gives us great pleasure in presenting the Mini project report on OPTIMIZATION
OF GENETIC ALGORITHM USING IRIS FLOWER DATASET
I would like to take this opportunity to thank my internal guide Prof. Mangesh
Ghonge for giving me all the help and guidance I needed. I am really grateful to
them for their kind support. Their valuable suggestions were very helpful.
I am also grateful to Prof. A. D.Potgantwar, Head of Computer Engineering De-
partment, CollegeName for his indispensable support, suggestions.
In the end our special thanks to Prof. Gokul Patil for providing various resources
such as laboratory with all needed software platforms, continuous Internet connec-
tion, for Our Project.
Sunil Rajput
Ashish Kumar Singh
Ashish Yadav
Mayank Patil
(B.E. Computer Engg.)

INDEX
1 Synopsis 1
1.1 Project Title.........................................................................................2
1.2 Project Option ....................................................................................2
1.3 Internal Guide .....................................................................................2
1.4 Sponsorship and External Guide........................................................2
1.5 Technical Keywords (As per ACM Keywords)..................................2
1.6 Problem Statement..............................................................................2
1.7 Abstract...............................................................................................2
1.8 Goals and Objective............................................................................3
1.9 Relevant mathematics associated with the Project..............................3
1.10 Names of Conferences / Journals where papers can be published......5
1.11 Review of Conference/Journal Papers supporting Project idea ..........5
1.12 Plan of Project Execution....................................................................6
2 Technical Keywords 7
2.1 Area of Project ....................................................................................8
2.2 Technical Keywords ...........................................................................8
3 Introduction 9
3.1 Project Idea .........................................................................................10
3.2 Motivation of the Project ....................................................................10
3.3 Literature Survey.................................................................................10
4 Problem Definition and scope 14
4.1 Problem Statement..............................................................................15
4.1.1 Goals and objectives...............................................................15
4.1.2 Major Constraints...................................................................15
4.2 Methodologiess...................................................................................15
4.3 Outcome..............................................................................................15

4.4 Applications ........................................................................................15
4.5 Hardware Resources Required............................................................16
4.6 Software Resources Required.............................................................16
5 METHODOLY 17
5.1Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . .18
5.1.1 Iris Dataset & Algorithms . . . . . . . . . . . . . . . . . . . . .19
5.1.2 Implementation/Results . . . . . . . . . . . . . . . . . . . . . . .20
5.2Team Organization . . . . . . . . . . . . . . . . . . . . . . . . . .25
5.2.1 Team structure . . . . . . . . . . . . . . . . . . . . . . . .26
6 Summary & Conclusion 29
7 References 31

SITRC, Department of Computer Engineering 2019-20 2
1.1 PROJECT TITLE
OPTIMIZATION OF GENETIC ALGORITHM USING IRIS FLOWER DATASET
1.2 PROJECT OPTION
Mini project
1.3 INTERNAL GUIDE
Prof. Mangesh Ghonge
1.4 SPONSORSHIP AND EXTERNAL GUIDE
SITRC Computer Department
1.5 TECHNICAL KEYWORDS (AS PER ACM KEYWORDS)
Genetic Algorithm Optimization, Iris Dataset, Machine Learning, Python.
1.6 PROBLEM STATEMENT
To Apply the Genetic Algorithm for optimization on a dataset obtained from UCI
ML repository.
For Example: IRIS Dataset using Python.
1.7 ABSTRACT
Machine learning is the core of Artificial Intelligence (AI) and pattern recognition is
also an important branch of AI. In this thesis, the conception of machine learning and
machine learning algorithms are introduced. Moreover, a typical and simple machine
learning algorithm called K-means is introduced. A case study about Iris
classification is introduced to show how the K-means works in pattern recognition.
The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced
by the British statistician and biologist Ronald Fisher in his 1936 paper. The use of
multiple measurements in taxonomic problems as an example of linear discriminant
data set because Edgar Anderson of

related species. Two of the three species were collected in Gaspe Peninsula all from
the same pasture, and picked on the same day and measured at the same time by the
same person with same apparatus.The data set consists of 50 samples from each of
three species of Iris that is 1) Iris Setosa 2) Iris Virginica 3) Iris Versicolor. Four
features were measured from each sample. They are 1) Sepal Length 2) Sepal Width
3) Petal Length4) Petal Width. All these four parameters are measured in
Centimeters. Based on the combination of these four features, the species among
three can be predicted.
1.8 GOALS AND OBJECTIVE
The aim of the case study is to design and implement a system of pattern recognition
for the Iris flower based on Machine Learning. This project shows the workflow of
pattern recognition and how to use machine learning approach to achieve this goal.
The data set was collected from an open source website of machine learning. The
programming language used in this project was Python.
1.9 REVIEW OFCONFERENCE/JOURNAL PAPERS SUPPORTING
PROJECT IDEA
[1] Author Li Liu, Murat Kantarcioglu and Bhavani Thurasingham discussed about
the securing of data using decision tree algorithm. . It is classified with the perturbed
data set, and this process improves the accuracy of data. It also reduce the costs off
communicatio and computation compared to any other cryptographici services They
also provide the direction for mapping the data mining functions instead of
reconstructing the original data which provide more privacy with less cost [3].
Author Ahmad Ashari, Paryudi, Min tjoa describes about the performance of various
classification algorithm for an alternative design in an energy simulation tool. This
shows there is possible way of comparing multiple algorithms. As per the
comparision of decision tree, naive bayes, K-Nearest Neighbour algorithm the
accuracy of decision tree is better than the other algorithms [4].

Author Sagar S.Nikam has defined the comparitive study on classification
techniques which mainly focus in the performance analysis of classification
algorithms and its Limitations. Also focus on classifying data into different classes
according to some constraint. The first approach is the Statistical approach which is
classical approach works on linear discrimination. The second is Machine Learning
which helps to solve more complex problems and third approach is Neural Network
shows the diverse source ranging from the understanding and emulating the human
brain to border issues of human abilities [6].
Author Rachna Raghuwanshi has describe about performance of the Naïve bayes
classifier and Decision Tree with the Fire Data Set to compare the accuracy. Where
as the problem with Cross Validation is avoided [7].
Author XHEMALI, J.HINDE, G.STONE precises on the automatic analysis and
classification of attribute data from training course web pages. They choose Naive
bayes, Decision Tree, Neural Network algorithm to classify the best data with same
data set. As per the result gained the accuracy of naive bayes is more accurate than
any other classification algorithm [8].
Author Bhaskar N.Patel, Satish G. Prajapati, Dr.Kamaljit I. Lakhtaria describes the
classification is the categorization of data into different category based on some
rules. The classification of data with decision tree is the pictorial view, and
categorizing is easier, accuracy is better than othe classification algorithm [11].
Learning is a very important feature of Artificial Intelligence. Many scientists
tried to explain and give a proper definition for learning. However, learning is
not that easy to cover with few simple sentences. Many computer scientists,
sociologists, logicians and other scientists discussed about this for a long time.
Some scientists think learning is an adaptive skill so that the system can perform
the similar task better in the next time(Simon 1987). Others claim that learning
is a process of collecting knowledge(Feigenbaum 1977). Even though there is
no proper definition for learning skill, we still need to give a definition for
machine learning. In general, machine learning aims to find out how the
computer algorithms can be improved automatically through
experience(Mitchell 1997).

Machine learning has an important position in the field of Artificial Intelligence.
At the beginning of development of Artificial Intelligence(AI), the AI system
does not have a thorough learning ability so the whole system is not perfect. For
instance, a computer cannot do self-adjustment when it faces problems.
Moreover, the computer cannot automatically collect and discover new
knowledge. The inference of the program needs more induction than deduction.
Therefore, computer only can figure out already existing truths. It does not have
the ability to discover a new logical theory, rules and so on.
1.10PLAN OF PROJECT EXECUTION
Using planner or alike project management tool.

CHAPTER 2
INTRODUCTION

3.1 PROJECT IDEA
Applying Genetic Algorithm Optimization using Iris Dataset in Python.
3.2 MOTIVATION OF THE PROJECT
3.3 LITERATURE SURVEY
[1]. Author Li Liu, Murat Kantarcioglu and Bhavani Thurasingham discussed about
the securing of data using decision tree algorithm. . It is classified with the perturbed
data set, and this process improves the accuracy of data. It also reduce the costs off
communicatio and computation compared to any other cryptographici services They
also provide the direction for mapping the data mining functions instead of
reconstructing the original data which provide more privacy with less cost [3].
Author Ahmad Ashari, Paryudi, Min tjoa describes about the performance of various
classification algorithm for an alternative design in an energy simulation tool. This
shows there is possible way of comparing multiple algorithms. As per the
comparision of decision tree, naive bayes, K-Nearest Neighbour algorithm the
accuracy of decision tree is better than the other algorithms [4].
Author Sagar S.Nikam has defined the comparitive study on classification
techniques which mainly focus in the performance analysis of classification
algorithms and its Limitations. Also focus on classifying data into different classes
according to some constraint. The first approach is the Statistical approach which is
classical approach works on linear discrimination. The second is Machine Learning
which helps to solve more complex problems and third approach is Neural Network

shows the diverse source ranging from the understanding and emulating the human
brain to border issues of human abilities [6].
Author Rachna Raghuwanshi has describe about performance of the Naïve bayes
classifier and Decision Tree with the Fire Data Set to compare the accuracy. Where
as the problem with Cross Validation is avoided [7].
Author XHEMALI, J.HINDE, G.STONE precises on the automatic analysis and
classification of attribute data from training course web pages. They choose Naive
bayes, Decision Tree, Neural Network algorithm to classify the best data with same
data set. As per the result gained the accuracy of naive bayes is more accurate than
any other classification algorithm [8].
Author Bhaskar N.Patel, Satish G. Prajapati, Dr.Kamaljit I. Lakhtaria describes the
classification is the categorization of data into different category based on some
rules. The classification of data with decision tree is the pictorial view, and
categorizing is easier, accuracy is better than othe classification algorithm [11].
Learning is a very important feature of Artificial Intelligence. Many scientists
tried to explain and give a proper definition for learning. However, learning is
not that easy to cover with few simple sentences. Many computer scientists,
sociologists, logicians and other scientists discussed about this for a long time.
Some scientists think learning is an adaptive skill so that the system can perform
the similar task better in the next time(Simon 1987). Others claim that learning
is a process of collecting knowledge(Feigenbaum 1977). Even though there is
no proper definition for learning skill, we still need to give a definition for
machine learning. In general, machine learning aims to find out how the
computer algorithms can be improved automatically through
experience(Mitchell 1997).

CHAPTER 4
PROBLEM DEFINITION AND SCOPE

4.1 PROBLEM STATEMENT
To Apply the Genetic Algorithm for optimization on a dataset obtained from UCI
ML repository.
For Example: IRIS Dataset using Python.
4.1.1 Goals and objectives
4.1.2 Major Constraints
It is not sustainable for incomplete Datasets.
4.4 OUTCOME
These results show the effect that the number of k and the random initialization
number have on the clustering result. It is also possible to see the advantages and
disadvantages of the K-means clustering algorithm.
4.5APPLICATIONS
Software engineering.
Traveling Salesman Problem.
Mobile communications infrastructure optimization.
Electronic circuit design, known as Evolvable hardware.

4.6 HARDWARE RESOURCES REQUIRED
1. Desktop
2. Inbuilt Compiler
3. Anaconda /Simulink tool
4. Numpy.
5. Iris Dataset.
Sr. No. Parameter Minimum Requirement Justification
1 CPU Speed 2 GHz Remark Required
2 RAM 3 GB Remark Required
Table 4.1: Hardware Requirements
4.7SOFTWARE RESOURCES REQUIRED
Platform :
1. Operating System: Windows,Ubunu/Linux
2. Programming Language: Python, Anaconda ,Numpy, etc.

CHAPTER 5
METHODLOGY

5.1 THE DESCRIPTION OF MACHINE LEARNING FORMS
A learning method is a complicated topic which has many different kinds of forms.
Everyone has different methods to study, so does the machine. We can categorize various
machine learning systems by different conditions. In general, we can separate learning
problems in two main categories: supervised learning and unsupervised learning.
Supervised learning is a commonly used machine learning algorithm which appears
in many different fields of computer science. In the supervised learning method, the
computer can establish a learning model based on the training data set. According to this
learning model, a computer can use the algorithm to predict or analyze new information.
By using special algorithms, a computer can find the best result and reduce the error rate
all by itself. Supervised learning is mainly used for two different patterns: classification
and regression.
In supervised learning, when a developer gives the computer some samples, each
sample is always attached with some classification information. The computer will
analyze these samples to get learning experiences so that the error rate would be reduced
when a classifier does recognitions for each patterns.
1.1 Iris Flower Species
The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by
the British statistician and biologist Ronald Fisher in his 1936 paper. The use of multiple
measurements in taxonomic problems as an example of linear discriminant analysis. It
to quantify the morphologic variation of Iris Flower of three related species. Two of the
three species were collected in Gaspe Peninsula all from the same pasture, and picked on
the same day and measured at the same time by the same person with same apparatus.

The data set consists of 50 samples from each of three species of Iris that is 1) Iris Setosa
2) Iris Virginica 3) Iris Versicolor. Four features were measured from each sample. They
are 1) Sepal Length 2) Sepal Width 3) Petal Length4) Petal Width. All these four
parameters are measured in Centimeters. Based on the combination of these four
features, the species among three can be predicted.
2.IMPLEMENTATION OF ALGORITHMS
2.1 K-Nearest Neighbors Algorithm
The k-Nearest Neighbors algorithm (or kNN for short) is an easy algorithm to understand
and to implement, and a powerful tool to have at your disposal. The implementation will
be specific for classification problems and will be demonstrated using the Iris flowers
classification problem.
5.1.1 What is k-Nearest Neighbors
The model for kNN is the entire training dataset. When a prediction is required for a
unseen data instance, the kNN algorithm will search through the training dataset for the
k-most similar instances. The prediction attribute of the most similar instances is
summarized and returned as the prediction for the unseen instance. The similarity
measure is dependent on the type of data. For real-valued data, the Euclidean distance
can be used. Other types of data such as categorical or binary data, Hamming distance
can be used. In the case of regression problems, the average of the predicted attribute
may be returned. In the case of classification, the most prevalent class may be returned.
5.1.2 How does k-Nearest Neighbors Work
The kNN algorithm is belongs to the family of instance-based, competitive learning and
lazy learning algorithms. Instance-based algorithms are those algorithms that model the
problem using data instances (or rows) in order to make predictive decisions. The kNN
algorithm is an extreme form of instance-based methods because all training
observations are retained as part of the model.
It is a competitive learning algorithm, because it internally uses competition between
model elements (data instances) in order to make a predictive decision. The objective
similarity measure between data instances causes each data instance to compete to the

Lazy learning refers to the fact that the algorithm does not build a model until the time
that a prediction is required. It is lazy because it only does work at the last second. This
has the benefit of only including data relevant to the unseen data, called a localized
model. A disadvantage is that it can be computationally expensive to repeat the same or
similar searches over larger training datasets.
Finally, kNN is powerful because it does not assume anything about the data, other than
a distance measure can be calculated consistently between any two instances. As such, it
is called non-parametric or non-linear as it does not assume a functional form.
3.LOGISTIC REGRESSION ALGORITHM
Logistic Regression is a type of regression that predicts the probability of occurrence of
an event by fitting data to a logit function (logistic function). Like many forms of
regression analysis, it makes use of several predictor variables that may be either
numerical or categorical. For instance, the probability that a person has a heart attack
within a specified time period might be predicted from knowledge of the person's age,
sex and body mass index. This regression is quite used in several scenarios such as
prediction of customer's propensity to purchase a product or cease a subscription in
marketing applications and many others.
3.1 What is Logistic Regression?
Logistic Regression, also known as Logit Regression or Logit Model, is a mathematical
model used in statistics to estimate (guess) the probability of an event occurring having
been given some previous data. Logistic Regression works with binary data, where either
the event happens (1) or the event does not happen (0). So given some feature x it tries
to find out whether some event y happens or not. So y can either be 0 or 1. In the case
where the event happens, y is given the value 1. If the event does not happen, then y is
given the value of 0. For example, if y represents whether a sports teams wins a match,
then y will be 1 if they win the match or y will be 0 if they do not. This is known as
Binomial Logistic Regression.

4. Implementation
4.1 Python
Python is a programming language created by Guido van Rossum in 1989. Python is
an interpreted, object-oriented, dynamic data type of high-level programming
languages.(Python Software Foundation 2013). The programming language style is
simple, clear and it also contains powerful different kinds of classes. Moreover, Python
can easily combine other programming languages, such as C or C++.
As a successful programming language, it has its own advantages:
1. Simple & easy to learn: The concept of this programming language is as simple
as it can be. That makes it easy for everyone to learn and use. It is easy to
understand the syntax.
2. Open source: Python is completely free as it is an open source software. Several
of open source scientific computing storage has the API for Python. Users can
easy to install Python on their own computer and use the standard and extend
library.
3. Scalability: Programmers can write their code in C or C++ and run them in
Python.
4.2 SciKit-learn
Scikit learn is an open source machine learning library for the Python programming
language. It features various classification, regression, and clustering algorithms and
is designed to interoperate with the Python numerical libraries NumPy and SciPy
(Pedregosa et al. 2011). SciKit-learn contains the K- means algorithm based on Python
and it helps to figure out how to implement this algorithm in programming.

4.3 Numpy, Scipy and Matplotlib
In Python, there is no data type called array. In order to implement the data type of
array with python, numpy and scipy are the essential libraries for analyzing and
calculating data. They are all open source libraries. Numpy is mainly used for the
matrix calculation. scipy is developed based on numpy and it is mainly used for
scientific research.
By using them in Python programming, they can be used with two simple commands:
>>> import numpy
>>> import scipy
Then Python will call the methods from numpy and scipy.
Mathplotlib is a famous library for plotting in Python. It provides a series of API and
it is suitable for making interactive mapping. In this case, we need to use it to find the
best result visually.
4.4 Preparing the Iris flower data set
The data set of Iris flower can be found in UCI Machine Learning Repositor (Bache
The data set of Iris flower can be also found in the Scikit-learn library. In site-
packages, there is a folder named sklearn. In this folder, there is a datasets subfolder
to contain many kinds of data sets for machine learning study.
The data set can be found in Appendix 1.
In the species of this table, 0 represents setosa, 1 represents versicolor, 2 represents
virginica
In the process of preparing a training data set and a testing data set, the greatest
problem is how to find the most appropriate way to divide the data set into training
data set and testing data set. In some cases, by using sampling theory and estimation
theory, we can separate the whole data set into training data set and testing data set.
However, sometimes, the method would be changed. The attributes and the property

The K-means algorithm and unsupervised learning does not use a training data set to
compute the training sample. Therefore, there is no need to separate the dataset into a
training data set and a testing data set. It can simply use this dataset to get the result
of clustering.
4.5 Machine learning system design
In general, the principles of machine learning system design should follow two basic
requirements :
the model selection and creation and
the learning algorithm selection and design.
In addition, different models can have different learning systems. On the other hand,
the objective function is also different in different learning models. The objective
function can help the machine to establish a learning system. Moreover, the accuracy
and complexity of different algorithms would be the most important factor of the
learning system. If the chosen algorithm is not very adaptive to the learning system,
then the efficiency and result of the learning system would be reduced. The selection
of training data set can have an influence on learning performance and feature
selection.
ILLUSTRATION OF SAMPLE IRIS DATASET
Sample datasets of Iris Setosa
Sample datasets of Iris Versicolor
Sample datasets of Iris Virginica

5. Evaluating results
The result is shown in four images for the clustering results. Figure 9 will be the result
with eight clusters. Figure 10 shows the result with three clusters.
Figure 9. Clustering of Iris dataset with eight clusters
Figure 10. Clustering of Iris dataset with three clusters

As seen in Figure 9 and 10, the whole dataset is separated into eight clusters in Figure
9 and three clusters are shown in Figure 10 with different colors. In Figure 9, most of
the samples stick together, it is really hard to distinguish them very clearly. The
differences between each sample is small. In this case, the cluster result is not
acceptable. On the other hand, in Figure 10, it can be easily seen that the cluster result
is much better than in
Figure 9. Even though there are still some overlapping parts between green and purple,
but it quite clear to see the difference between these three clusters. This case shows
the importance of choosing the number of clusters for K-means algorithm.
Sometimes for the real datasets, it is difficult to know how many data sets should be
used. Therefore, it is quite hard to choose the number of clusters. One method is to use
the ISODATA algorithm, through the merging and division of clusters to obtain a
reasonable number of k.
Figure 11. Clustering of Iris dataset with bad initialization

Figure 11 , shows the cluster result with three clusters but bad initialization. We
can see that some of the samples change their class compare to the Figure 10. With
a random initialization number, the system will obtain different cluster results.
Therefore, a random initialization number is very important for a good cluster
result. However, we do not know what could be a good initialization number. In
this case, in some machine learning systems, the scientists will choose GA(Genetic
Algorithm) to have the initialization point.
Figure 12 below illustrates a standard result of K-means clustering of Iris
ning datasets
in supervised learning. The number of clusters are three and with a good
initialization point. This is the best classification of all shown here. The whole
dataset has been separated properly and each dataset has good differences. In Figure
10, it shows the stardard result of classification in unsupervised learning. Compare
to this figure, Figure 10 still has some small differences but it still works very well.
Almost every data belongs to the right place.
Figure 12. Clustering of Iris dataset in ground truth
These results show the effect that the number of k and the random initialization
number have on the clustering result. It is also possible to see the advantages and
disadvantages of the K-means clustering algorithm.

HISTOGRAM :
BOX AND WHISKER PLOTS(Give idea about distribution of input
attributes)

5.1 TEAM ORGANIZATION
Team Structure :
Fig 5.4.1 : Team Structure
5.1.1 Team structure
Team Leader: Sunil Rajput
Software Developer: Sunil Rajput
Hardware Developer: Ashish Yadav, Mayank Patil.
Documentation : Ashish kumar Singh.

CHAPTER 8
SUMMARY AND CONCLUSION

The primary goal of supervised learning is to build a model that generalizes . Here
in this project we make predictions on unseen data which is the data not used to train
the model hence the machine learning model built should accurately predicts the
species of future flowers rather than accurately predicting the label of already trained
data. With the rapid development of technology, AI has been applied in many fields.
Machine learning is the most fundamental approach to achieve AI. This thesis
describes the work principle of machine learning, two different learning forms of
machine learning and an application of machine learning. In addition, a case study
of Iris flower recognition to introduce the workflow of machine learning in pattern
recognition is shown. In this case, the meaning of pattern recognition and how the
machine learning works in pattern recognition has been described. The K-means
algorithm, which is a very simple machine learning algorithm from the unsupervised
learning method is used. Evolutionary algorithms have been around since the early
sixties. They apply the rules of nature: evolution through selection of the fittest
individuals, the individuals representing solutions to the mathematical problem.
Genetic algorithms are so far generally the best and most robust kind of evolutionary
algorithms. The work also shows how to use SciKit-learn or Anaconda 3,0 software
to learn machine learning.

[1] Abbas MAkbari Z. (2010). "A multilevel evolutionary algorithm for optimizing numerical
functions" IJIEC 2 (2011): 419 430
[2] Ananya (2017), What is Diabetes, retrieved online from https://www.news-
medical.net/health/What- is-Diabetes.aspx
[3] Coffin, D.; S., Robert E. (2008). "Linkage Learning in Estimation of Distribution Algorithms".
Linkage in Evolutionary Computation. Springer Berlin Heidelberg: 141 156.
doi:10.1007/978-3-540- 85068-7_7.
[4] Eiben, A. E. et al (1994). Genetic algorithms with multi-parent recombination, PPSN III:
Proceedings of the International Conference on Evolutionary Computation. The Third
Conference on Parallel Problem Solving from Nature: 78 87. ISBN 3-540-58484-6.
[5] Clustering - K- -means-Ineractive demo, Available at:
http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html. Consulted 22
AUG 2013
[6] Bache, K.& Lichman, M. 2013. UCI Machine Learning Repository
[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and
Computer Science.
[7] Bishop, C. 2006. Pattern Recognition and Machine Learning. New York: Springer, pp.424-
428.
[8] Fisher, R.A. 1936. UCI Machine Learning Repository: Iris Data Set. Available at:
http://archive.ics.uci.edu/ml/datasets/Iris. Consulted 10 AUG 2013
[9] Mitchell, T. 1997. Machine learning. McGraw Hill.
[10]
[11] dy of Classification Techniques for Fire Data
7 (1), 2016, 78-82.

Genetic Algorithm for optimization on IRIS Dataset REPORT pdf

More Related Content

What's hot (20)

Similar to Genetic Algorithm for optimization on IRIS Dataset REPORT pdf (20)

More from Sunil Rajput (8)

Recently uploaded (20)

Genetic Algorithm for optimization on IRIS Dataset REPORT pdf