SlideShare a Scribd company logo
Indian Institute of Technology Jodhpur
Computer Science of Engineering
Sixth Semester (2015-2016)
Machine learning(Building and comparing various machine learning
models to recognize hand written digits)
Team Members:Shrey Maheshwari(ug201314017)
:Ravi Prakash Gupta(ug201310027)
Mentor:Prof. K.R.Chowdhary
1
Contents
1 Introduction 3
2 Theory 5
3 Implementation(Data Structures And Algorithms) 7
4 Application 10
5 Result 11
6 Conclusion 12
2
1 Introduction
The data file contains grayscale images of handdrawn digits, from zero through
nine. Each image is 16 pixels in height and 16 pixels in width, for a total
of 256 pixels in total. Each pixel has a single pixelvalue associated with it,
indicating the lightness or darkness of that pixel.Each image is 8bit depth
single channel so this pixelvalue is an integer between 0 and 255, inclusive.
We have modified it in the following way value=1 if pixel value >127 value
=0 otherwise Previously each pixel value was taking 8 bits. But now each
pixel value is taking 1 bit only. So 1 image is taking 256 bits only. The
data set, (train.csv), has 266 columns. The first 256 columns are pixel values
associated and other 10 indicate the label i.e. the digit that was drawn by
the user. We divided our data into 2 sets 1. Training data which comprises
of 80 % of the data. 2.Test data which comprises of 20% of the data.
Figure 1: Data.
Figure 1 shows the data.
The test data set, (test.csv), is the same as the training set, except that
it does not contain the ”label” column.
3
Figure 2: Visualization of data
Classification is a process of assigning new data to a category based on
training data in known categories. In this paper, we use a number of human
identified digit images split into training and test set. A classifier learns on
training images and labels and produces output based on test images. Output
is then compared to test labels to evaluate the classification performance. A
good classifier should be able to learn on the training data but maintain the
generalization property to be accurate when identifying the test set.
4
2 Theory
The given problem falls under the category of Supervised Learning. Su-
pervised learning is the machine learning task of inferring a function from
supervised training data. The training data consist of a set of training ex-
amples. In supervised learning, each example is a pair consisting of an input
object (typically a vector) and a desired output value (also called the su-
pervisory signal).Our problem is basically a multiclass classification problem
. To solve this problem we used Logistic Regression. logistic regression is
a regression model where the dependent variable (DV) is categorical. The
logistic function is defined as follows:
σ(t) =
et
1 + e−t
Figure 3: Logistic Function.
Figure 3 shows the Logistic Function.
5
The range of logistic function is [0,1] So our prediction will fall in [0,1]
which indicates the probability that output is that number of which logistic
regression is applied.And then our final answer will be the index value at
which the probability is maximum. We applied gradient descent algorithm
as it is simple to implement. Gradient descent is a first-order optimization
algorithm. To find a local minimum of a function using gradient descent, one
takes steps proportional to the negative of the gradient(or of the approximate
gradient) of the function at the current point. If instead one takes steps
proportional to the positive of the gradient, one approaches a local maximum
of that function; the procedure is then known as gradient ascent. We used
around 500 iterations to reach to a saturation state after which error was not
decreasing much.We plotted error vs iterations curve to ensure that the error
is always decreasing.If it had not been the case then we would have reduced
our learning rate. We used Regularization to ensure that our model do not
overfit the training data. Regularization, in mathematics and statistics and
particularly in the fields of machine learning and inverse problems, refers to
a process of introducing additional information in order to solve an ill- posed
problem or to preventoverfitting. In general, a regularization term R(f) is
introduced to a general loss function:
x
min
n
i=1
V (f(ˆxi), ˆyi) + λR(f)
for a loss function V that describes the cost of predicting f(x) when the label
is y , such as the square loss or hinge loss, and for the term λ which controls
the importance of the regularization term. R(f) is typically a penalty on the
complexity of f , such as restrictions for smoothness or bounds on the vector
space norm. There are 10 labels but logictic regression is binary classifier so
we need to train 10 logistic regression in binary classifier so we need to 10
logistic classifier . Then we applied one vs all method to finally choose the
final answer.
The one-vs.-all strategy involves training a single classifier per class, with
the samples of that class as positive samples and all other samples as neg-
atives. This strategy requires the base classifiers to produce a realvalued
confidence score for its decision, rather than just a class label; discrete class
labels alone can lead to ambiguities, where multiple classes are predicted for
a single sample.
6
3 Implementation(Data Structures And Al-
gorithms)
Firstly we initialized our learning parameter which we denoted by theta as all
zeros.Our hypothesis was sigmoid of Xθ where X is training data and theta is
learned parameter.We used sigmoid function because it gives output between
0 and 1.We wanted output of hypothesis between 0 and 1 because this is
multiclass problem which we are converting into 10 binary class classifier.
Sigmoid function looks like-
S(t) =
1
1 + e−t
Its range is [0,1].
We defined our cost function which shows the error between our prediction
and actual value as
J (θ) =
1
m
[
m
i=1
y(i)
logθ(x(i)
) + (1 − y(i)
) log(1 − hθ(x(i)
))]
It is a convex function so the problem of converging at local optima will not
come into picture. The initial value of error was
array([ 0.69314718, 0.69314718, 0.69314718, 0.69314718, 0.69314718,
0.69314718, 0.69314718, 0.69314718, 0.69314718, 0.69314718])
We initialized learning rate with some value,initially high value was cre-
ating problem so we decreased the value until the problem was solved.To get
the best value of learning rate is hit and trial. Then we implemented the
loop in which we updated our learning parameter as-
θj := θj − α
1
m
m
i=1
(hθ(x(i)
) − y(i)
)x
(i)
j
In every iteration we stored the value of cost error to ensure that it is decreas-
ing with every iteration.Initially in the project there was a problem that the
value of cost error was not decreasing continuously, it was increasing some-
times in between. So we found out that the value of learning rate is high due
to which it was overshooting. So we decreased the value of learning rate and
set it to maximum value at which cost error function was not overshooting.
7
Then we plotted cost error vs no. of iterations graph to ensure that the func-
tion is strictly decreasing and also to see that have we achieved saturation in
the error. So we recognized that after 200 iterations the error was not still
saturated so we increased the number of iterations.We achieved best possible
number of iterations by hit and trial and finally decided to keep it 500.
Figure 4: Cost Error Function.
Figure 4 shows the Cost Error Function.
Our hypothesis is a linear model which is x1θ1 + x2θ2 + x3θ3 + x4θ4........
Then we predicted output on test data by multiplying matrix of test data by
learned parameters. We obtained 10 output for every fresh example.Output
of every test example is 10x1 row. The value at each index indicates the
probability of that test example being that index. So we choose index with
maximum value as our final answer.
In sensitive applications like cheque reading in banks we cant afford to
make a single mistake so there is a different approach for it.If all the 10 values
8
of our prediction are less than some threshold(say 0.7) i.e. none of the model
is much confident about the prediction, we can output not able to recognize
so that the case is handled manually.Then we tried different combinations of
learning rate and number of iterations to achieve the best accuracy on test
data.
We finally set the value of number of iterations to 500 and learning rate
to 0.5 . Data structures arrays and matrices, list We used matrix as data
structures to store training data,test data and learned parameters.It was best
data structure to use because we need to take transpose of data, sometimes
we need to add rows,coloumns and remove rows,coloumns .It was also very
easy to perform matrix multiplication and we saved a lot of time by doing
implementing vectorization instead of loops which was only possible with the
help of matrix data structure in numpy module. We used list data structure
of python to store the values of cost error in every iterations, it was easy to
append the values in the list.
We used arrays(single dimension matrix) to plot the graphs and contain
some other useful information. We used Matplotlib package to plot graph.We
provided values for the x-axis and plot the cost error values at those x-axis
values and joined them.We obtained a decreasing curve which saturated after
some value of x.
def sigmoid(z):
a = np. exp(−z)
a = 1 + a
a = 1
a
return a
9
4 Application
It has wide applications .It can be used in banks for amount reading although
it very sensitive because we cant afford a single mistake so we should add
a new feature in our model that if the confidence is low in classifying then
it should give output as not able to recognize and that case should be han-
dled manually.It can further be extended to character recognition of various
languages.It can be used in post offices for postal code reading,there it will
reduce the workload significantlly and make the process faster.
The same project can be further extended to read telephone numbers
. In that case we first need to separate different digits and then recognize
them individually.This can also convert a handwritten document to digital
document which you can edit. This can be useful for the applications which
translate the sentence from one language to another .Those applications can
only work if the input is keyboard written so handwritten character recog-
nition system can recognize those characters and then provide input to the
translator application.
10
5 Result
Figure 5: Table of accuacy obtained.
Figure 5 shows table of accuracy obtained.
11
6 Conclusion
In this paper, a method to increase handwritten digits recognition rates by
combining feature extractions methods is proposed. Experimental results
showed that complementary features can significantly improve recognition
performance. The proposed concavity feature extraction method in conjunc-
tion with gradient features gave the highest recognition accuracy in majority
of experiments. The method worked well with chaincode features as well,
being one out of two top performers. It also has the lowest feature count
among observed complementary features, which lowers computational cost
of classification. Experiments using reduced training sets showed that the
proposed concavity method outperforms other observed approaches making
it useful for applications requiring use of a small training set. Adding training
instances from another dataset reflected on the recognition accuracy differ-
ently for different datasets. Accuracy was increased on two datasets and
decreased on one, indicating that learning process is sensitive to small dif-
ferences in image retrieval and preprocessing. Overall, the proposed method
achieved the best performance.
12
References
[1] Recognizing Handwritten Digits Using Mixtures of Linear Models Ge-
offrey E Hinton Michael Revow Peter Dayan Department of Computer
Science, University of Toronto Toronto, Ontario, Canada M5S 1A4.
[2] Representation and Recognition of Handwritten Digits Using Deformable
Templates Anil K. Jain, Fellow, IEEE, and Douglas Zongke
[3] Comparison of Learning Algorithms For Handwritten digit recogni-
tion Y.LeCun,L.Jackel,L.Bottou,C.Cortes Bell Laboratories,Holmdel,NJ
07733, USA
[4] Handwritten Digit Recognition using DCT and HMMs Syed Salman Ali,
Muhammad Usman Ghani Lahore, Pakistan
[5] Neocognitron for handwritten digit recognition Kunihiko Fukushima
Tokyo University of Technology, 14041, Ktakura, Hachioji, Tokyo
1920982, Japan
[6] R. P. W. Duin, The combining classifier: to train or not to train?, in
Pattern Recognition, 2002. Proceedings. 16th International Conference
on, 2002, vol. 2, pp. 765770 vol.2.
[7] Josef Kittler, Mohamad Hatef, Robert P. W. Duin, and Jiri Matas, On
combining classifiers, IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence, vol. 20, pp. 226239,1998
[8] M. Riedmiller and H. Braun, A direct adaptive method for faster back-
propagation learning: The RPROP algorithm,International Conference
on Neural Networks, pp. 586591,1993.
[9] Y.C. Chim, A. A. Kassim, and Y. Ibrahim, Dual classifier system for
handprinted alphanumeric character recognition, Pattern Analysis and
Application, , no. 1, pp. 155162, 1998.
13

More Related Content

PDF
Ridge regression, lasso and elastic net
Vivian S. Zhang
 
PPT
Java™ (OOP) - Chapter 7: "Multidimensional Arrays"
Gouda Mando
 
PPTX
Binary Class and Multi Class Strategies for Machine Learning
Paxcel Technologies
 
PDF
Gradient Boosted Regression Trees in scikit-learn
DataRobot
 
PDF
Chapter3 hundred page machine learning
mustafa sarac
 
PDF
Introduction to Boosted Trees by Tianqi Chen
Zhuyi Xue
 
PDF
Machine learning
Shreyas G S
 
PDF
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
Jay (Jianqiang) Wang
 
Ridge regression, lasso and elastic net
Vivian S. Zhang
 
Java™ (OOP) - Chapter 7: "Multidimensional Arrays"
Gouda Mando
 
Binary Class and Multi Class Strategies for Machine Learning
Paxcel Technologies
 
Gradient Boosted Regression Trees in scikit-learn
DataRobot
 
Chapter3 hundred page machine learning
mustafa sarac
 
Introduction to Boosted Trees by Tianqi Chen
Zhuyi Xue
 
Machine learning
Shreyas G S
 
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
Jay (Jianqiang) Wang
 

What's hot (20)

PDF
Machine Learning Basics
Humberto Marchezi
 
PDF
Dictionary Learning for Massive Matrix Factorization
recsysfr
 
PPTX
Ppt shuai
Xiang Zhang
 
PDF
Data-Driven Recommender Systems
recsysfr
 
PDF
Introduction to conventional machine learning techniques
Xavier Rafael Palou
 
PDF
Tree models with Scikit-Learn: Great models with little assumptions
Gilles Louppe
 
PDF
Xgboost
Vivian S. Zhang
 
PDF
Implementing the Perceptron Algorithm for Finding the weights of a Linear Dis...
Dipesh Shome
 
PPTX
Comparitive Analysis of Algorithm strategies
Talha Shaikh
 
PDF
Vc dimension in Machine Learning
VARUN KUMAR
 
PDF
Implementation of K-Nearest Neighbor Algorithm
Dipesh Shome
 
PPT
Chapter 16
ashish bansal
 
PPTX
Ot regularization and_gradient_descent
ankit_ppt
 
PDF
Machine learning
Andrea Iacono
 
PDF
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET Journal
 
PDF
InfoGAN and Generative Adversarial Networks
Zak Jost
 
PDF
Beginners Guide to Non-Negative Matrix Factorization
Benjamin Bengfort
 
PDF
Matlab solved problems
Make Mannan
 
Machine Learning Basics
Humberto Marchezi
 
Dictionary Learning for Massive Matrix Factorization
recsysfr
 
Ppt shuai
Xiang Zhang
 
Data-Driven Recommender Systems
recsysfr
 
Introduction to conventional machine learning techniques
Xavier Rafael Palou
 
Tree models with Scikit-Learn: Great models with little assumptions
Gilles Louppe
 
Implementing the Perceptron Algorithm for Finding the weights of a Linear Dis...
Dipesh Shome
 
Comparitive Analysis of Algorithm strategies
Talha Shaikh
 
Vc dimension in Machine Learning
VARUN KUMAR
 
Implementation of K-Nearest Neighbor Algorithm
Dipesh Shome
 
Chapter 16
ashish bansal
 
Ot regularization and_gradient_descent
ankit_ppt
 
Machine learning
Andrea Iacono
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET Journal
 
InfoGAN and Generative Adversarial Networks
Zak Jost
 
Beginners Guide to Non-Negative Matrix Factorization
Benjamin Bengfort
 
Matlab solved problems
Make Mannan
 
Ad

Viewers also liked (16)

PDF
Resume Krushi
Avenue Supermarts Ltd
 
DOCX
Impact of promotional Activities on selling HFFC products
Avenue Supermarts Ltd
 
DOCX
Resume2014.docx
Aysleth Zeledon
 
PPTX
Richard Avedon
betielemallmann
 
DOCX
Resume Vanessa Vaughter Dir of Ed
Vanessa Vaughter Weilage
 
PDF
2014 Karles Invitational Conference
John Canning
 
PDF
гост р 52906 обор.авиатопливообечпечения
Borkhuu Bataa
 
PPTX
Change ppt
Avenue Supermarts Ltd
 
DOC
Cv fernnando vasquez
Vasquez Fernando
 
DOCX
Resume - Sunny Verma - 2
Sunny Verma
 
PPTX
(1) sustainable land use
ThetSu2
 
PDF
PROSTHO CONFERENCE
Anas Sufian Khakwani Muhammad
 
PDF
Portfolio
Valentina Morra
 
PDF
Tics consulta
Jairito Solorzano
 
DOCX
SOF and GPF Integration
Dennis H. Levesque
 
PDF
OECD_PortugalKeynote_SatelliteTechnology_R1r00
Keiran Millard
 
Resume Krushi
Avenue Supermarts Ltd
 
Impact of promotional Activities on selling HFFC products
Avenue Supermarts Ltd
 
Resume2014.docx
Aysleth Zeledon
 
Richard Avedon
betielemallmann
 
Resume Vanessa Vaughter Dir of Ed
Vanessa Vaughter Weilage
 
2014 Karles Invitational Conference
John Canning
 
гост р 52906 обор.авиатопливообечпечения
Borkhuu Bataa
 
Cv fernnando vasquez
Vasquez Fernando
 
Resume - Sunny Verma - 2
Sunny Verma
 
(1) sustainable land use
ThetSu2
 
PROSTHO CONFERENCE
Anas Sufian Khakwani Muhammad
 
Portfolio
Valentina Morra
 
Tics consulta
Jairito Solorzano
 
SOF and GPF Integration
Dennis H. Levesque
 
OECD_PortugalKeynote_SatelliteTechnology_R1r00
Keiran Millard
 
Ad

Similar to Ai_Project_report (20)

DOCX
Essentials of machine learning algorithms
Arunangsu Sahu
 
PDF
Gradient Descent Code Implementation.pdf
MubashirHussain792093
 
PDF
Explore ml day 2
preetikumara
 
PDF
working with python
bhavesh lande
 
PDF
Machine learning (5)
NYversity
 
PDF
Machine Learning with Python- Machine Learning Algorithms- Logistic Regressio...
KalighatOkira
 
PDF
CS229 Machine Learning Lecture Notes
Eric Conner
 
PPT
Feature-selection-techniques to be used in machine learning algorithms
ssuser363702
 
PPTX
Machine learning session4(linear regression)
Abhimanyu Dwivedi
 
PDF
Linear Regression (Machine Learning)
Omkar Rane
 
PDF
3ml.pdf
MianAdnan27
 
PPTX
Neural networks
HarshitGupta367
 
PPTX
1.2 A Tutorial Example - Deep Learning Foundations and Concepts.pptx
Ayad Al-Husseiny
 
PPTX
large scale Machine learning
Full Stack Developer at Electro Mizan Andisheh
 
PDF
Machine learning (1)
NYversity
 
PPTX
Deeplearning for Computer Vision PPT with
naveenraghavendran10
 
PDF
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
ETS Asset Management Factory
 
PDF
Kaggle KDD Cup Report
Chamila Wijayarathna
 
PPTX
Ppt on Regularization, batch normamalization.pptx
christindaimari07
 
PDF
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Pooyan Jamshidi
 
Essentials of machine learning algorithms
Arunangsu Sahu
 
Gradient Descent Code Implementation.pdf
MubashirHussain792093
 
Explore ml day 2
preetikumara
 
working with python
bhavesh lande
 
Machine learning (5)
NYversity
 
Machine Learning with Python- Machine Learning Algorithms- Logistic Regressio...
KalighatOkira
 
CS229 Machine Learning Lecture Notes
Eric Conner
 
Feature-selection-techniques to be used in machine learning algorithms
ssuser363702
 
Machine learning session4(linear regression)
Abhimanyu Dwivedi
 
Linear Regression (Machine Learning)
Omkar Rane
 
3ml.pdf
MianAdnan27
 
Neural networks
HarshitGupta367
 
1.2 A Tutorial Example - Deep Learning Foundations and Concepts.pptx
Ayad Al-Husseiny
 
Machine learning (1)
NYversity
 
Deeplearning for Computer Vision PPT with
naveenraghavendran10
 
Python + Tensorflow: how to earn money in the Stock Exchange with Deep Learni...
ETS Asset Management Factory
 
Kaggle KDD Cup Report
Chamila Wijayarathna
 
Ppt on Regularization, batch normamalization.pptx
christindaimari07
 
Transfer Learning for Improving Model Predictions in Highly Configurable Soft...
Pooyan Jamshidi
 

Ai_Project_report

  • 1. Indian Institute of Technology Jodhpur Computer Science of Engineering Sixth Semester (2015-2016) Machine learning(Building and comparing various machine learning models to recognize hand written digits) Team Members:Shrey Maheshwari(ug201314017) :Ravi Prakash Gupta(ug201310027) Mentor:Prof. K.R.Chowdhary 1
  • 2. Contents 1 Introduction 3 2 Theory 5 3 Implementation(Data Structures And Algorithms) 7 4 Application 10 5 Result 11 6 Conclusion 12 2
  • 3. 1 Introduction The data file contains grayscale images of handdrawn digits, from zero through nine. Each image is 16 pixels in height and 16 pixels in width, for a total of 256 pixels in total. Each pixel has a single pixelvalue associated with it, indicating the lightness or darkness of that pixel.Each image is 8bit depth single channel so this pixelvalue is an integer between 0 and 255, inclusive. We have modified it in the following way value=1 if pixel value >127 value =0 otherwise Previously each pixel value was taking 8 bits. But now each pixel value is taking 1 bit only. So 1 image is taking 256 bits only. The data set, (train.csv), has 266 columns. The first 256 columns are pixel values associated and other 10 indicate the label i.e. the digit that was drawn by the user. We divided our data into 2 sets 1. Training data which comprises of 80 % of the data. 2.Test data which comprises of 20% of the data. Figure 1: Data. Figure 1 shows the data. The test data set, (test.csv), is the same as the training set, except that it does not contain the ”label” column. 3
  • 4. Figure 2: Visualization of data Classification is a process of assigning new data to a category based on training data in known categories. In this paper, we use a number of human identified digit images split into training and test set. A classifier learns on training images and labels and produces output based on test images. Output is then compared to test labels to evaluate the classification performance. A good classifier should be able to learn on the training data but maintain the generalization property to be accurate when identifying the test set. 4
  • 5. 2 Theory The given problem falls under the category of Supervised Learning. Su- pervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training ex- amples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the su- pervisory signal).Our problem is basically a multiclass classification problem . To solve this problem we used Logistic Regression. logistic regression is a regression model where the dependent variable (DV) is categorical. The logistic function is defined as follows: σ(t) = et 1 + e−t Figure 3: Logistic Function. Figure 3 shows the Logistic Function. 5
  • 6. The range of logistic function is [0,1] So our prediction will fall in [0,1] which indicates the probability that output is that number of which logistic regression is applied.And then our final answer will be the index value at which the probability is maximum. We applied gradient descent algorithm as it is simple to implement. Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient(or of the approximate gradient) of the function at the current point. If instead one takes steps proportional to the positive of the gradient, one approaches a local maximum of that function; the procedure is then known as gradient ascent. We used around 500 iterations to reach to a saturation state after which error was not decreasing much.We plotted error vs iterations curve to ensure that the error is always decreasing.If it had not been the case then we would have reduced our learning rate. We used Regularization to ensure that our model do not overfit the training data. Regularization, in mathematics and statistics and particularly in the fields of machine learning and inverse problems, refers to a process of introducing additional information in order to solve an ill- posed problem or to preventoverfitting. In general, a regularization term R(f) is introduced to a general loss function: x min n i=1 V (f(ˆxi), ˆyi) + λR(f) for a loss function V that describes the cost of predicting f(x) when the label is y , such as the square loss or hinge loss, and for the term λ which controls the importance of the regularization term. R(f) is typically a penalty on the complexity of f , such as restrictions for smoothness or bounds on the vector space norm. There are 10 labels but logictic regression is binary classifier so we need to train 10 logistic regression in binary classifier so we need to 10 logistic classifier . Then we applied one vs all method to finally choose the final answer. The one-vs.-all strategy involves training a single classifier per class, with the samples of that class as positive samples and all other samples as neg- atives. This strategy requires the base classifiers to produce a realvalued confidence score for its decision, rather than just a class label; discrete class labels alone can lead to ambiguities, where multiple classes are predicted for a single sample. 6
  • 7. 3 Implementation(Data Structures And Al- gorithms) Firstly we initialized our learning parameter which we denoted by theta as all zeros.Our hypothesis was sigmoid of Xθ where X is training data and theta is learned parameter.We used sigmoid function because it gives output between 0 and 1.We wanted output of hypothesis between 0 and 1 because this is multiclass problem which we are converting into 10 binary class classifier. Sigmoid function looks like- S(t) = 1 1 + e−t Its range is [0,1]. We defined our cost function which shows the error between our prediction and actual value as J (θ) = 1 m [ m i=1 y(i) logθ(x(i) ) + (1 − y(i) ) log(1 − hθ(x(i) ))] It is a convex function so the problem of converging at local optima will not come into picture. The initial value of error was array([ 0.69314718, 0.69314718, 0.69314718, 0.69314718, 0.69314718, 0.69314718, 0.69314718, 0.69314718, 0.69314718, 0.69314718]) We initialized learning rate with some value,initially high value was cre- ating problem so we decreased the value until the problem was solved.To get the best value of learning rate is hit and trial. Then we implemented the loop in which we updated our learning parameter as- θj := θj − α 1 m m i=1 (hθ(x(i) ) − y(i) )x (i) j In every iteration we stored the value of cost error to ensure that it is decreas- ing with every iteration.Initially in the project there was a problem that the value of cost error was not decreasing continuously, it was increasing some- times in between. So we found out that the value of learning rate is high due to which it was overshooting. So we decreased the value of learning rate and set it to maximum value at which cost error function was not overshooting. 7
  • 8. Then we plotted cost error vs no. of iterations graph to ensure that the func- tion is strictly decreasing and also to see that have we achieved saturation in the error. So we recognized that after 200 iterations the error was not still saturated so we increased the number of iterations.We achieved best possible number of iterations by hit and trial and finally decided to keep it 500. Figure 4: Cost Error Function. Figure 4 shows the Cost Error Function. Our hypothesis is a linear model which is x1θ1 + x2θ2 + x3θ3 + x4θ4........ Then we predicted output on test data by multiplying matrix of test data by learned parameters. We obtained 10 output for every fresh example.Output of every test example is 10x1 row. The value at each index indicates the probability of that test example being that index. So we choose index with maximum value as our final answer. In sensitive applications like cheque reading in banks we cant afford to make a single mistake so there is a different approach for it.If all the 10 values 8
  • 9. of our prediction are less than some threshold(say 0.7) i.e. none of the model is much confident about the prediction, we can output not able to recognize so that the case is handled manually.Then we tried different combinations of learning rate and number of iterations to achieve the best accuracy on test data. We finally set the value of number of iterations to 500 and learning rate to 0.5 . Data structures arrays and matrices, list We used matrix as data structures to store training data,test data and learned parameters.It was best data structure to use because we need to take transpose of data, sometimes we need to add rows,coloumns and remove rows,coloumns .It was also very easy to perform matrix multiplication and we saved a lot of time by doing implementing vectorization instead of loops which was only possible with the help of matrix data structure in numpy module. We used list data structure of python to store the values of cost error in every iterations, it was easy to append the values in the list. We used arrays(single dimension matrix) to plot the graphs and contain some other useful information. We used Matplotlib package to plot graph.We provided values for the x-axis and plot the cost error values at those x-axis values and joined them.We obtained a decreasing curve which saturated after some value of x. def sigmoid(z): a = np. exp(−z) a = 1 + a a = 1 a return a 9
  • 10. 4 Application It has wide applications .It can be used in banks for amount reading although it very sensitive because we cant afford a single mistake so we should add a new feature in our model that if the confidence is low in classifying then it should give output as not able to recognize and that case should be han- dled manually.It can further be extended to character recognition of various languages.It can be used in post offices for postal code reading,there it will reduce the workload significantlly and make the process faster. The same project can be further extended to read telephone numbers . In that case we first need to separate different digits and then recognize them individually.This can also convert a handwritten document to digital document which you can edit. This can be useful for the applications which translate the sentence from one language to another .Those applications can only work if the input is keyboard written so handwritten character recog- nition system can recognize those characters and then provide input to the translator application. 10
  • 11. 5 Result Figure 5: Table of accuacy obtained. Figure 5 shows table of accuracy obtained. 11
  • 12. 6 Conclusion In this paper, a method to increase handwritten digits recognition rates by combining feature extractions methods is proposed. Experimental results showed that complementary features can significantly improve recognition performance. The proposed concavity feature extraction method in conjunc- tion with gradient features gave the highest recognition accuracy in majority of experiments. The method worked well with chaincode features as well, being one out of two top performers. It also has the lowest feature count among observed complementary features, which lowers computational cost of classification. Experiments using reduced training sets showed that the proposed concavity method outperforms other observed approaches making it useful for applications requiring use of a small training set. Adding training instances from another dataset reflected on the recognition accuracy differ- ently for different datasets. Accuracy was increased on two datasets and decreased on one, indicating that learning process is sensitive to small dif- ferences in image retrieval and preprocessing. Overall, the proposed method achieved the best performance. 12
  • 13. References [1] Recognizing Handwritten Digits Using Mixtures of Linear Models Ge- offrey E Hinton Michael Revow Peter Dayan Department of Computer Science, University of Toronto Toronto, Ontario, Canada M5S 1A4. [2] Representation and Recognition of Handwritten Digits Using Deformable Templates Anil K. Jain, Fellow, IEEE, and Douglas Zongke [3] Comparison of Learning Algorithms For Handwritten digit recogni- tion Y.LeCun,L.Jackel,L.Bottou,C.Cortes Bell Laboratories,Holmdel,NJ 07733, USA [4] Handwritten Digit Recognition using DCT and HMMs Syed Salman Ali, Muhammad Usman Ghani Lahore, Pakistan [5] Neocognitron for handwritten digit recognition Kunihiko Fukushima Tokyo University of Technology, 14041, Ktakura, Hachioji, Tokyo 1920982, Japan [6] R. P. W. Duin, The combining classifier: to train or not to train?, in Pattern Recognition, 2002. Proceedings. 16th International Conference on, 2002, vol. 2, pp. 765770 vol.2. [7] Josef Kittler, Mohamad Hatef, Robert P. W. Duin, and Jiri Matas, On combining classifiers, IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, vol. 20, pp. 226239,1998 [8] M. Riedmiller and H. Braun, A direct adaptive method for faster back- propagation learning: The RPROP algorithm,International Conference on Neural Networks, pp. 586591,1993. [9] Y.C. Chim, A. A. Kassim, and Y. Ibrahim, Dual classifier system for handprinted alphanumeric character recognition, Pattern Analysis and Application, , no. 1, pp. 155162, 1998. 13