SlideShare a Scribd company logo
Neural Networks in R
Venkat Reddy
Statinfer.com
Data Science Training and R&D
statinfer.com
2
Corporate Training
Classroom Training
Online Training
Contact us
info@statinfer.com
venkat@statinfer.com
Note
•This presentation is just my class notes. The course notes for data
science training is written by me, as an aid for myself.
•The best way to treat this is, as a high-level summary; the actual
session went more in depth and contained detailed information and
examples
•Most of this material was written as informal notes, not intended for
publication
•Please send questions/comments/corrections to info@statinfer.com
•Please check our website statinfer.com for latest version of this
document
-Venkata Reddy Konasani
(Cofounder statinfer.com)
statinfer.com
3
Contents
Contents
•Neural network Intuition
•Neural network and vocabulary
•Neural network algorithm
•Math behind neural network algorithm
•Building the neural networks
•Validating the neural network model
•Neural network applications
•Image recognition using neural networks
5
statinfer.com
Recap of Logistic Regression
Recap of Logistic Regression
•Categorical output YES/NO type
•Using the predictor variables to predict the categorical output
7
statinfer.com
Decision Boundary
Decision Boundary – Logistic Regression
•The line or margin that separates
the classes
•Classification algorithms are all
about finding the decision
boundaries
•It need not be straight line
always
•The final function of our decision
boundary looks like
• Y=1 if wTx+w0>0 ; else Y=0
9
x1
x2
statinfer.com
Decision Boundary – Logistic Regression
•In logistic regression, Decision Boundary can be derived from the
logistic regression coefficients and the threshold.
• Imagine the logistic regression line p(y)=e(b0+b1x1+b2x2)/1+exp(b0+b1x1+b2x2)
• Suppose if p(y)>0.5 then class-1 or else class-0
• log(y/1-y)=b0+b1x1+b2x2
• Log(0.5/0.5)=b0+b1x1+b2x2
• 0=b0+b1x1+b2x2
• b0+b1x1+b2x2=0 is the line
10
statinfer.com
Decision Boundary – Logistic Regression
•Rewriting it in mx+c form
• X2=(-b1/b2)X1+(-b0/b2)
•Anything above this line is class-1, below this line is class-0
• X2>(-b1/b2)X1+(-b0/b2)is class-1
• X2<(-b1/b2)X1+(-b0/b2) is class-0
• X2=(-b1/b2)X1+(-b0/b2) tie probability of 0.5
•We can change the decision boundary by changing the threshold
value(here 0.5)
11
statinfer.com
LAB: Logistic Regression and
Decision Boundary
LAB: Logistic Regression
•Dataset: Emp_Productivity/Emp_Productivity.csv
•Filter the data and take a subset from above dataset . Filter condition
is Sample_Set<3
•Draw a scatter plot that shows Age on X axis and Experience on Y-axis.
Try to distinguish the two classes with colors or shapes (visualizing the
classes)
•Build a logistic regression model to predict Productivity using age and
experience
•Create the confusion matrix
•Calculate the accuracy and error rates
13
statinfer.com
LAB: Decision Boundary
•Draw a scatter plot that shows Age on X axis and Experience on Y-axis.
Try to distinguish the two classes with colors or shapes (visualizing the
classes)
•Build a logistic regression model to predict Productivity using age and
experience
•Finally draw the decision boundary for this logistic regression model
14
statinfer.com
Code: Logistic Regression
15
statinfer.com
Code: Logistic Regression
16
statinfer.com
Code: Logistic Regression
17
statinfer.com
Code: Logistic Regression
18
statinfer.com
Code: Decision Boundary
19
statinfer.com
Code: Decision Boundary
20
statinfer.com
New representation for logistic
regression
New representation for logistic regression
22
𝑦 =
𝑒 𝛽0+𝛽1𝑥1+𝛽2𝑥2
1 + 𝑒 𝛽0+𝛽1𝑥1+𝛽2𝑥2
𝑦 =
1
1 + 𝑒−(𝛽0+𝛽1𝑥1+𝛽2𝑥2)
x1
x2
w1
w2
w0
yW0+w1x1+w2x
2
𝑦 = 𝑔(σ 𝑤 𝑘 𝑥 𝑘)
𝑦 = 𝑔 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 𝑤ℎ𝑒𝑟𝑒 𝑔 𝑥 =
1
1 + 𝑒−𝑥
statinfer.com
Finding the weights in logistic regression
23
x1
x2
W0+w1x1+w2x
2
w1
w2
w0
y
𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[𝑦𝑖 − 𝑔 σ 𝑤 𝑘 𝑥 𝑘 ]2
𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤 𝑘 𝑥 𝑘)
The above output is a non linear function of linear combination of inputs – A typical multiple logistic
regression line
statinfer.com
LAB: Non-Linear Decision
Boundaries
LAB: Non-Linear Decision Boundaries
•Dataset: “Emp_Productivity/ Emp_Productivity_All_Sites.csv”
•Draw a scatter plot that shows Age on X axis and Experience on Y-axis.
Try to distinguish the two classes with colors or shapes (visualizing the
classes)
•Build a logistic regression model to predict Productivity using age and
experience
•Finally draw the decision boundary for this logistic regression model
•Create the confusion matrix
•Calculate the accuracy and error rates
25
statinfer.com
Code: Non-Linear Decision Boundaries
26
statinfer.com
Code: Non-Linear Decision Boundaries
27
statinfer.com
Code: Non-Linear Decision Boundaries
28
statinfer.com
Code: Non-Linear Decision Boundaries
29
statinfer.com
Code: Non-Linear Decision Boundaries
30
statinfer.com
Non-Linear Decision
Boundaries-Issue
Non-Linear Decision Boundaries
32x1
x2
statinfer.com
Non-Linear Decision Boundaries-issues
•Logistic Regression line doesn’t seam to be a good option when we
have non-linear decision boundaries
33
statinfer.com
Non-Linear Decision
Boundaries-Solution
Intermediate outputs
35
x1
x2
𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡1
𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤 𝑘 𝑥 𝑘) ,Say h1
𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡2
𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤 𝑘 𝑥 𝑘) , Say h2
Model2Model1
statinfer.com
The Intermediate output
•Using the x’s Directly predicting y is challenging.
•We can predict h, the intermediate output, which will indeed predict
Y
36
x1
x2
w11
w12
y
w22
w21
h1
h2
W1
W2
statinfer.com
Finding the weights for intermediate outputs
37
𝐹𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡
𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗ℎ𝑗)
𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡2
ℎ2 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤2𝑘 𝑥 𝑘)
𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤1 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[ℎ1 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2
𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤2 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[ℎ2 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2
𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡1
ℎ1 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤1𝑘 𝑥 𝑘)
𝑊𝑒 𝑓𝑖𝑛𝑑 𝑊 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[𝑦𝑖 − 𝑔 σ 𝑊𝑗ℎ𝑗𝑖 ]2
x1
x2
h1 = 𝑔(෍ 𝑤1𝑘 𝑥 𝑘)
𝑦 =
𝑔(σ 𝑊𝑗ℎ𝑗)
ℎ2 = 𝑔(෍ 𝑤2𝑘 𝑥 𝑘)
statinfer.com
LAB: Intermediate output
LAB: Intermediate output
•Dataset: Emp_Productivity/ Emp_Productivity_All_Sites.csv
•Filter the data and take first 74 observations from above dataset .
•Build a logistic regression model to predict Productivity using age and
experience
•Calculate the prediction probabilities for all the inputs. Store the
probabilities in inter1 variable
•Filter the data and take observations from row 34 onwards.
•Build a logistic regression model to predict Productivity using age and
experience
•Calculate the prediction probabilities for all the inputs. Store the
probabilities in inter2 variable
•Build a consolidated model to predict productivity using inter-1 and inter-2
variables
•Create the confusion matrix and find the accuracy and error rates for the
consolidated model
39
statinfer.com
Code: Intermediate output
40
statinfer.com
Code: Intermediate output
41
statinfer.com
Code: Intermediate output
42
statinfer.com
Code: Intermediate output
43
statinfer.com
Code: Intermediate output
44
statinfer.com
Code: Intermediate output
45
statinfer.com
Code: Intermediate output
46
statinfer.com
Code: Intermediate output
47
statinfer.com
Code: Intermediate output
48
statinfer.com
Code: Intermediate output
49
statinfer.com
Code: Intermediate output
50
statinfer.com
Code: Intermediate output
51
statinfer.com
Code: Intermediate output
52
statinfer.com
Code: Intermediate output
53
statinfer.com
Neural Network intuition
Neural Network intuition
55
𝐹𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡
𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗ℎ𝑗)
𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗 𝑔(σ 𝑤𝑗𝑘 𝑥 𝑘))
ℎ𝑗 = 𝑜𝑢𝑡 𝑥 = 𝑔(σ 𝑤𝑗𝑘 𝑥 𝑘)
• So h is a non linear function of linear combination of inputs – A multiple logistic regression line
• Y is a non linear function of linear combination of outputs of logistic regressions
• Y is a non linear function of linear combination of non linear functions of linear combination of inputs
𝑊𝑒 𝑓𝑖𝑛𝑑 𝑊 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[𝑦𝑖 − 𝑔 σ 𝑊𝑗ℎ𝑗 ]2
𝑊𝑒 𝑓𝑖𝑛𝑑 {𝑊𝑗} & {𝑤𝑗𝑘} 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[𝑦𝑖 − 𝑔(σ 𝑊𝑗 𝑔(σ 𝑤𝑗𝑘 𝑥 𝑘))]2
Neural networks is all about finding the sets of weights {Wj,} and {wjk} using Gradient Descent Method
statinfer.com
Neural Network intuition
56
𝐹𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡
𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗ℎ𝑗)
𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡2
ℎ2 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤2𝑘 𝑥 𝑘)
𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤1 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[ℎ1 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2
𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤2 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[ℎ2 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2
𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡1
ℎ1 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤1𝑘 𝑥 𝑘)
𝑊𝑒 𝑓𝑖𝑛𝑑 𝑊 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1
𝑛
[𝑦𝑖 − 𝑔 σ 𝑊𝑗ℎ𝑗𝑖 ]2
x1
x2
h1 = 𝑔(෍ 𝑤1𝑘 𝑥 𝑘)
𝑦 =
𝑔(σ 𝑊𝑗ℎ𝑗)
ℎ2 = 𝑔(෍ 𝑤2𝑘 𝑥 𝑘)
statinfer.com
The Neural Networks
•The neural networks methodology is similar to the intermediate output
method explained above.
•But we will not manually subset the data to create the different
models.
•The neural network technique automatically takes care of all the
intermediate outputs using hidden layers
•It works very well for the data with non-linear decision boundaries
•The intermediate output layer in the network is known as hidden layer
•In Simple terms, neural networks are multi layer nonlinear regression
models.
•If we have sufficient number of hidden layers, then we can estimate
any complex non-linear function 57
statinfer.com
Neural network and vocabulary
58
1
x1
x2
h1
h2
y
Hidden
Layer
Input Output
h1=
1
1+𝑒−(𝑤11
+𝑤12
𝑥1
+𝑤22
𝑥2
)
h2=
1
1+𝑒−(𝑤21
+𝑤13
𝑥1
+𝑤23
𝑥2
)
𝑦 =
1
1 + 𝑒−(𝑊0+𝑊1ℎ1+𝑊2ℎ2)
• X1,X2 inputs
• 1  bias term
• W’s are weights
• 1/(1+e-u) is the sigmoid
function
• Y is output
statinfer.com
Why are they called hidden layers?
•A hidden layer “hides” the desired output.
•Instead of predicting the actual output using a single model, build
multiple models to predict intermediate output
•There is no standard way of deciding the number of hidden layers.
59
statinfer.com
The Neural network Algorithm
Algorithm for Finding weights
61
x1
x2
h1
𝑦
h2
w11
w21
w12
w13
w22
w23
W2
W1
W3
• Algorithm is all about finding the weights/coefficients
• We randomly initialize some weights; Calculate the output by supplying training input; If there is an error
the weights are adjusted to reduce this error.
statinfer.com
The Neural Network Algorithm
•Step 1: Initialization of weights: Randomly select some weights
•Step 2 : Training & Activation: Input the training values and perform
the calculations forward.
•Step 3 : Error Calculation: Calculate the error at the outputs. Use the
output error to calculate error fractions at each hidden layer
•Step 4: Weight training : Update the weights to reduce the error,
recalculate and repeat the process of training & updating the weights
for all the examples.
•Step 5: Stopping criteria: Stop the training and weights updating
process when the minimum error criteria is met 62
statinfer.com
Randomly initialize weights
63
x1
x2
h1
𝑦
h2
w11
w21
w12
w13
w22
w23
W2
W1
W3
Step 1: Initialization
of weights: Randomly
select some weights
statinfer.com
Training & Activation
64
x1
x2
h1
𝑦
h2
w11
w21
w12
w13
w22
w23
W1
W0
W2
h1=
1
1+𝑒−(𝑤11
+𝑤12
𝑥1
+𝑤22
𝑥2
)
h2=
1
1+𝑒−(𝑤21
+𝑤13
𝑥1
+𝑤23
𝑥2
)
𝑦 =
1
1 + 𝑒−(𝑊0+𝑊1ℎ1+𝑊2ℎ2)
Training input & calculations – Feed Forward
Step 2 : Input the
training values and
perform the
calculations forward
statinfer.com
Error Calculation at Output
65
Step 3: Calculate the
error at the outputs.
Use the output error
to calculate error
fractions at each
hidden layer
෍
𝑖=1
𝑛
𝑦𝑖 − 𝑔 ෍
𝑘=1
𝑚
𝑤 𝑘ℎ 𝑘𝑖
2
x1
x2
h1
𝑦
h2
w11
w21
w12
w13
w22
w23
W2
W1
W3
statinfer.com
Error Calculation at hidden layers
66
x1
x2
h1
𝑦
h2
Step 3: Calculate the
error at the outputs.
Use the output error to
calculate error fractions
at each hidden layer
𝐸𝑟𝑟 = ෍
𝑖=1
𝑛
𝑦𝑖 − 𝑔 ෍
𝑘=1
𝑚
𝑤 𝑘ℎ 𝑘𝑖
2
Back Propagation - Calculate Errors signals backwards
𝛿 𝑘 = 𝑦 1 − 𝑦 ∗ 𝑊 ∗ 𝐸𝑟𝑟
statinfer.com
Calculate weight corrections
67
x1
x2
h1
𝑦
h2
Dw11
Dw21
Dw12
Dw13
Dw22
Dw23
Step 4: Update the
weights to reduce
the error, recalculate
and repeat the
process
DW1
DW0
DW2
𝐸𝑟𝑟 = ෍
𝑖=1
𝑛
𝑦𝑖 − 𝑔 ෍
𝑘=1
𝑚
𝑤 𝑘ℎ 𝑘𝑖
2
𝛿 𝑘 = 𝑦 1 − 𝑦 ∗ 𝑊 ∗ 𝐸𝑟𝑟
statinfer.com
Update Weights
68
x1
x2
h1
𝑦
h2
w11 :=w11+Dw11
w12 :=w12+Dw12
w22 :=w22+Dw22
w13 :=w13+Dw13
w21 :=w21+Dw21 Step 4: Update the
weights to reduce
the error, recalculate
and repeat the
process
W1:=W1+DW1
W0:=W0+DW0
W2:=W2+DW2
𝐸𝑟𝑟 = ෍
𝑖=1
𝑛
𝑦𝑖 − 𝑔 ෍
𝑘=1
𝑚
𝑤 𝑘ℎ 𝑘𝑖
2
𝛿 𝑘 = 𝑦 1 − 𝑦 ∗ 𝑊 ∗ 𝐸𝑟𝑟
statinfer.com
Stopping Criteria
69
x1
x2
h1
𝑦
h2
w11
w21
w12
w13
w22
w23
W1
W0
W2
෍
𝒊=𝟏
𝒏
𝒚𝒊 − 𝒈 ෍
𝒌=𝟏
𝒎
𝒘 𝒌 𝒉 𝒌𝒊
𝟐
Step 5: Stop the
training and weights
updating process
when the minimum
error criteria is met
statinfer.com
Once Again ….Neural network Algorithm
•Step 1: Initialization of weights: Randomly select some weights
•Step 2 : Training & Activation: Input the training values and perform
the calculations forward.
•Step 3 : Error Calculation: Calculate the error at the outputs. Use the
output error to calculate error fractions at each hidden layer
•Step 4: Weight training : Update the weights to reduce the error,
recalculate and repeat the process of training & updating the weights
for all the examples.
•Step 5: Stopping criteria: Stop the training and weights updating
process when the minimum error criteria is met 70
statinfer.com
Neural network Algorithm-
Demo
Neural network Algorithm-Demo
Looks like a dataset that can’t be separated by using single linear decision boundary/perceptron
72
0
0
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
11
1 1
1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
11
1 1
1
1
1
1
1 1
1
00 00 0
0 00
0
0 00 00 00 0
0 0
0
0
00 00 00 0
0 00 00 00 00 00 0
0 00
0 0
0
0 0 0 0
0
11
1
0
00 00 0
0 00
0 00 00 00 0
0
0
0
0 00 00 0
0 00 00 00 00 00 0
0 00
0 0
0 0 0
1
1
1
1 1
1
1
1
1 1
1
1
1
1 1
1
1
1
1 1
1
11
1 1
1
1
1
1
1 1
1
0
1
1
1
1
1 1
1
1
1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1 1
1
1
1
1
1
1
1 1
1
00 00 0
0 00
0 00 00 00 0
0
0
0
0
0 00 00 000 0
0 00
0 0 0
1
00 0
0
00 00 00 0
0 00 00 0
0 00 00 00 00 00 0
0 00
0 0
1
statinfer.com
Neural network Algorithm-Demo
•Lets consider a similar but simple classification example
•XOR Gate Dataset
73
Input1(x1) Input2(x2) Output(y)
1 1 0
1 0 1
0 1 1
0 0 0 0
0
1
1
statinfer.com
Randomly initialize weights
74
x1
x2
h1
𝑦
h2
0.5
Step 1: Initialization
of weights: Randomly
select some weights
statinfer.com
Activation
75
1
1
0.818
0.7137126
0.731
0.5
h1=
1
1+𝑒−(𝑤11
+𝑤12
𝑥1
+𝑤22
𝑥2
)
h2=
1
1+𝑒−(𝑤21
+𝑤13
𝑥1
+𝑤23
𝑥2
)
𝑦 =
1
1 + 𝑒−(𝑊0+𝑊1ℎ1+𝑊2ℎ2)
input1 input2 output
1 1 0
• In this step we input 1 and 1 as input &
expect 0 as output.
• With these weights we got an error of -
0.714 at output layer
• We need to adjust weights
statinfer.com
Back-Propagate Errors
76
Ytarget-Yobs -0.7137126
Y(1-Y) 0.20432693
𝛿 at Y -0.1458307
1
1
h1
𝑦
h2
0.5
𝛿 at Y -0.1458307
h2(1-h2) 0.1966119
𝑊 1
𝛿 at h2 -0.0286721
𝛿 at Y -0.1458307
h1(1-h1) 0.1491465
𝑊 -1
𝛿 at h1 0.0217501
𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘
𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘
𝜂 is the learning parameter
𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟
(for hidden layers𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟)
Err=Expected output-Actual output
statinfer.com
Calculate Weight Corrections
77
1
1
𝛿 at h1
0.0217501
𝛿 at Y
−0.1458307
𝛿 at h2
−0.02867
0.5
∆𝑊𝑗𝑘 = 𝜂. 𝑦𝑗 𝛿 𝑘
∆𝑊=0.1*1*0.0217501
∆𝑊=0.00217501
∆𝑊𝑗𝑘 = 𝜂. 𝑦𝑗 𝛿 𝑘
∆𝑊=0.1*1*−0.02867
∆𝑊=-0.002867
𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘
𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘
𝜂 is the learning parameter
𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟
(for hidden layers𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟)
Err=Expected output-Actual output
statinfer.com
Updated Weights
78
1
1
𝛿 at h1
0.0217501
𝛿 at Y
−0.1458307
𝛿 at h2
−0.02867
0.502175
𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘
W(new)=W(old)+Correction
W(new) =0.502175
𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘
W(new)=W(old)+Correction
W(new) =-1.002867
𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘
𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘
𝜂 is the learning parameter
𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟
(for hidden layers𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟)
Err=Expected output-Actual output
statinfer.com
Updated Weights..contd
79
x1
x2
h1
𝑦
h2
0.50218
statinfer.com
Iterations and Stopping Criteria
•This iteration is just for one training example (1,1,0). This is just the
first epoch.
•We repeat the same process of training and updating of weights for all
the data points
•We continue and update the weights until we see there is no
significant change in the error or when the maximum permissible error
criteria is met.
•By updating the weights in this method, we reduce the error slightly.
When the error reaches the minimum point the iterations will be
stopped and the weights will be considered as optimum for this
training set 80
statinfer.com
XOR Gate final NN Model
81
statinfer.com
Building the Neural network
The good news is..
•We don’t need to write the code for weights calculation and updating
•There readymade codes, libraries and packages available in R
•The gradient descent method is not very easy to understand for a non
mathematics students
•Neural network tools don’t expect the user to write the code for the
full length back propagation algorithm
83
statinfer.com
Building the neural network in R
•We have a couple of packages available in R
• net
• neuralnet
•We need to mention the dataset, input, output & number of hidden
layers as input.
•Neural network calculations are very complex. The algorithm may take
sometime to produce the results
•One need to be careful while setting the parameters. The runtime
changed based on the input parameter values
84
statinfer.com
LAB: Building the neural
network in R
LAB: Building the neural network in R
•Build a neural network for XOR data
86
statinfer.com
Code: Building the neural network in R
87
statinfer.com
R Code Options
• neuralnet(Productivity~Age+Experience,data=Emp_Productivity_raw, hidden=2,
stepmax = 1e+07, threshold=0.00001, linear.output = FALSE)
•The number of hidden layers in the neural network. It is actually the
number of nodes. We can input a vector to add more hidden layers
•Stepmax:
• The number of iterations while executing algorithm.
• Sometimes we may need more than 100,000 steps for the algorithm to converge.
• Some times we may get an error “Alogorithm didn't converge with the default step
max”; We need to increase the stepmax parameter value in such cases.
• Additional info
• One epoch one complete run of training data. If epoch=500 then algorithm sees
the entire data set 500 times
• One iteration is the number of times a "batch" of data passed through the
algorithm(Steps). If batch size is same as full training data then iterations is equal
to epochs
88
statinfer.com
R Code Options
•Threshold
• Connected to weights optimization on error function
• By default, neuralnet requires the model partial derivative error to change at least
0.01 otherwise it will stop changing.
• It can be used as a stopping criteria. If the partial derivative of error function
reaches this threshold then the algorithm will stop.
• A lower threshold value will force the algorithm for more iterations and accuracy.
•The output is expected to be linear by default. We need to specifically
mention linear.output = FALSE for classification problems
89
statinfer.com
Code: Building the neural network in R
90
Execute a couple of
times to get zero error
statinfer.com
Code: Building the neural network in R
91
statinfer.com
Code: Building the neural network in R
92
statinfer.com
Code: Building the neural network in R
93
statinfer.com
Lab: Building Neural network on Employee
productivity data
•Dataset: Emp_Productivity/Emp_Productivity.csv
•Draw a 2D graph between age, experience and productivity
•Build neural network algorithm to predict the productivity based on
age and experience
•Plot the neural network with final weights
•Increase the hidden layers and see the change in accuracy
94
statinfer.com
Code: Neural network on Employee
productivity data
95
statinfer.com
Code: Neural network on Employee
productivity data
96
statinfer.com
Code: Neural network on Employee
productivity data
97
statinfer.com
Code: Neural network on Employee
productivity data
98
statinfer.com
Code: Neural network on Employee
productivity data
99
statinfer.com
Code: Neural network on Employee
productivity data
100
statinfer.com
Code: Neural network on Employee
productivity data
101
statinfer.com
There can be many solutions
102
Set-1
statinfer.com
There can be many solutions
103
Set-2
statinfer.com
There can be many solutions
104
Set-3
statinfer.com
Local vs. Global Minimum
Local vs. Global Minimum
• The neural network might give different results with different start weights.
• The algorithm tries to find the local minima rather than global minima.
• There can be many local minima’s, which means there can be many solutions to
neural network problem
• We need to perform the validation checks before choosing the final model.
106Global
minimu
m
Local
Minimu
m
statinfer.com
Hidden layers and their role
Multi Layer Neural Network
108
1
x1
x2
H11
H12
Y
Hidden LayersInput Output
H21
H22
H22
statinfer.com
The role of hidden layers
109
• The First hidden layer
• The first layer is nothing but the liner
decision boundaries
• The simple logistic regression line
outputs
• We can see them as multiple lines on
the decision space
0
0
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
11
1 1
1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
11
1 1
1
1
1
1
1 1
1
00 00 0
0 00
0
0 00 00 00 0
0 0
0
0
0
0 00 00 0
0 00 00 00 00 00 0
0 00
0 0
0
0 0 0 0
0
11
1
0
00 00 0
0 00
0 00 00 00 0
0
0
0
0 00 00 0
0 00 00 00 00 00 0
0 00
0 0
0 0 0
1
1
1
1 1
1
1
1
1 1
1
1
1
1 1
1
1
1
1 1
1
11
1 1
1
1
1
1
1 1
1
0
1
1
1
1
1 1
1
1
1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1 1
1
1
1
1
1
1
1 1
1
00 00 0
0 00
0 00 00 00 0
0
0
0
0
0 00 00 000 0
0 00
0 0 0
1
00 0
0
00 00 00 0
0 00 00 0
0 00 00 00 00 00 0
0 00
0 0
1
statinfer.com
The role of hidden layers
110
• The Second hidden layer
• The Second layer combines these
lines and forms simple decision
boundary shapes
• The third hidden layer forms even
complex shapes within the
boundaries generated by second
layer.
• You can imagine All these layers
together divide the whole objective
space into multiple decision
boundary shapes, the cases within
the shape are class-1 outside the
shape are class-2
0
0
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
11
1 1
1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1
1 1
1
11
1 1
1
1
1
1
1 1
1
00 00 0
0 00
0
0 00 00 00 0
0 0
0
0
0
0 00 00 0
0 00 00 00 00 00 0
0 00
0 0
0
0 0 0 0
0
11
1
0
00 00 0
0 00
0 00 00 00 0
0
0
0
0 00 00 0
0 00 00 00 00 00 0
0 00
0 0
0 0 0
1
1
1
1 1
1
1
1
1 1
1
1
1
1 1
1
1
1
1 1
1
11
1 1
1
1
1
1
1 1
1
0
1
1
1
1
1 1
1
1
1
1
1
1
1
1 1
1
1
1
1
1 1
1
1
1
1 1
1
1
1
1
1
1
1 1
1
00 00 0
0 00
0 00 00 00 0
0
0
0
0
0 00 00 000 0
0 00
0 0 0
1
00 0
0
00 00 00 0
0 00 00 0
0 00 00 00 00 00 0
0 00
0 0
1
statinfer.com
The Number of hidden layers
The Number of hidden layers
•There is no concrete rule to choose the right number. We need to
choose by trail and error validation
•Too few hidden layers might result in imperfect models. The error rate
will be high
•High number of hidden layers might lead to over‐fitting, but it can be
identified by using some validation techniques
•The final number is based on the number of predictor variables,
training data size and the complexity in the target.
•When we are in doubt, its better to go with many hidden nodes than
few. It will ensure higher accuracy. The training process will be slower
though
•Cross validation and testing error can help us in determining the model
with optimal hidden layers
112
statinfer.com
LAB: Digit Recognizer
LAB: Digit Recognizer
• Take an image of a handwritten single digit, and determine what that digit is.
• Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal Service.
The original scanned digits are binary and of different sizes and orientations; the images here have
been de slanted and size normalized, resultingin 16 x 16 grayscale images (Le Cun et al., 1990).
• The data are in two gzipped files, and each line consists of the digitid (0-9) followed by the 256
grayscale values.
• Build a neural network model that can be used as the digit recognizer
• Use the test dataset to validate the true classification power of the model
• What is the final accuracy of the model?
114
statinfer.com
Code: Digit Recognizer
#Importing test and training data - USPS Data
digits_train <- read.table("D:Google DriveTrainingDatasetsDigit
RecognizerUSPSzip.train.txt", quote=""", comment.char="")
digits_test <- read.table("D:Google DriveTrainingDatasetsDigit
RecognizerUSPSzip.test.txt", quote=""", comment.char="")
dim(digits_train)
col_names <- names(digits_train[,-1])
label_levels<-names(table(digits_train$V1))
#Lets see some images.
for(i in 1:10)
{
data_row<-digits_train[i,-1]
pixels = matrix(as.numeric(data_row),16,16,byrow=TRUE)
image(pixels, axes = FALSE)
title(main = paste("Label is" , digits_train[i,1]), font.main = 4)
}
115
statinfer.com
Code: Digit Recognizer
116
statinfer.com
Code: Digit Recognizer
#####Creating multiple columns for multiple outputs
#####We need these variables while building the model
digit_labels<-data.frame(label=digits_train[,1])
for (i in 1:10)
{
digit_labels<-cbind(digit_labels, digit_labels$label==i-1)
names(digit_labels)[i+1]<-paste("l",i-1,sep="")
}
label_names<-names(digit_labels[,-1])
#Update the training dataset
digits_train1<-cbind(digits_train,digit_labels)
names(digits_train1)
#formula y~. doesn't work in neuralnet function
model_form <- as.formula(paste(paste(label_names, collapse = " + "), "~", paste(col_names,
collapse = " + "))) 117
statinfer.com
Code: Digit Recognizer
118
statinfer.com
Code: Digit Recognizer
119
statinfer.com
Code: Digit Recognizer
120
statinfer.com
Code: Digit Recognizer
121
statinfer.com
Code: Digit Recognizer
122
statinfer.com
Real-world applications
Real-world applications
•Self driving car by taking the video as input
•Speech recognition
•Face recognition
•Cancer cell analysis
•Heart attack predictions
•Currency predictions and stock price predictions
•Credit card default and loan predictions
•Marketing and advertising by predicting the response probability
•Weather forecasting and rainfall prediction
124
statinfer.com
Real-world applications
•Face recognition :
• https://www.youtube.com/watch?v=57VkfXqJ1LU
• https://www.youtube.com/watch?v=xVQLBbXdVUY
•Autonomous car software
• https://www.youtube.com/watch?v=gG72-SjwxAM
125
statinfer.com
Drawbacks of Neural Networks
Drawbacks of Neural Networks
•No real theory that explains how to choose the number of hidden
layers
•Takes lot of time when the input data is large, needs powerful
computing machines
•Difficult to interpret the results. Very hard to interpret and measure
the impact of individual predictors
•Its not easy to choose the right training sample size and learning rate.
•The local minimum issue. The gradient descent algorithm produces the
optimal weights for the local minimum, the global minimum of the
error function is not guaranteed
127
statinfer.com
Why the name neural
network?
Why the name neural network?
•The neural network algorithm for
solving complex learning problems is
inspired by human brain
•Our brains are a huge network of
processing elements. It contains a
network of billions of neurons.
•In our brain, a neuron receives input
from other neurons. Inputs are
combined and send to next neuron
•The artificial neural network
algorithm is built on the same logic.
129
statinfer.com
Why the name neural network?
130
Dendrites Input(X)
Cell body  Processor(Swx)
Axon  Output(Y)
statinfer.com
Conclusion
Conclusion
•Neural network is a vast subject. Many data scientists solely focus on only
Neural network techniques
•In this session we practiced the introductory concepts only. Neural Networks
has much more advanced techniques. There are many algorithms other than
back propagation.
•Neural networks particularly work well on some particular class of problems
like image recognition.
•The neural networks algorithms are very calculation intensive. They require
highly efficient computing machines. Large datasets take significant amount
of runtime on R. We need to try different types of options and packages.
•Currently there is a lot of exciting research is going on, around neural
networks.
•After gaining sufficient knowledge in this basic session, you may want to
explore reinforced learning, deep learning etc.,
132
statinfer.com
Appendix
Math- How to update the
weights?
Math- How to update the weights?
•We update the weights backwards by iteratively calculating the error
•The formula for weights updating is done using gradient descent method or
delta rule also known as Widrow-Hoff rule
•First we calculate the weight corrections for the output layer then we take
care of hidden layers
135
statinfer.com
Math- How to update the weights?
• 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘
• 𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘
• 𝜂 is the learning parameter
• 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟 (for hidden layers 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟)
• Err=Expected output-Actual output
•The weight corrections is calculated based on the error function
•The new weights are chosen in such way that the final error in that
network is minimized
136
statinfer.com
Math-How does the delta rule
work?
How does the delta rule work?
• Lets consider a simple example to understand the weight updating using delta rule.
138
• If we building a simple
logistic regression line. We
would like to find the weights
using weight update rule
• Y=1/(1+e-wx) is the equation
• We are searching for the
optimal w for our data
• Let w be 1
• Y=1/(1+e-x) is the initial
equation
• The error in our initial step is
3.59
• To reduce the error we will
add a delta to w and make it
1.5
• Now w is 1.5 (blue line)
• Y=1/(1+e-1.5x) the updated
equation
• With the updated weight, the
error is 1.57
• We can further reduce the
error by increasing w by delta
statinfer.com
How does the delta rule work?
139
• If we repeat the same process of adding delta
and updating weights, we can finally end up
with minimum error
• The weight at that final step is the optimal
weight
• In this example the weight is 8, and the error is
0
• Y=1/(1+e-8x) is the final equation
• In this example, we manually changed the weights to reduce the error. This is just for
intuition, manual updating is not feasible for complex optimization problems.
• In gradient descent is a scientific optimization method. We update the weights by calculating
gradient of the function.
statinfer.com
Math-How does gradient
descent work?
How does gradient descent work?
•Gradient descent is one of the famous ways to calculate the local minimum
•By Changing the weights we are moving towards the minimum value of the
error function. The weights are changed by taking steps in the negative
direction of the function gradient(derivative).
141
Error
Weight
statinfer.com
Demo-How does gradient
descent work?
Does this method really work?
• We changed the weights did it reduce the overall error?
• Lets calculate the error with new weights and see the change
143
1
1
0.818545647
0.706552799
0.729364041
0.50218
statinfer.com
Gradient Descent method validation
•With our initial set of weights the overall error was 0.7137,Y Actual is
0, Y Predicted is 0.7137 error =0.7137
•The new weights give us a predicted value of 0.70655
•In one iteration, we reduced the error from 0.7137 to 0.70655
•The error is reduced by 1%. Repeat the same process with multiple
epochs and training examples, we can reduce the error further.
144
input1 input2 Output(Y-Actual) Y Predicted Error
Old Weights 1 1 0 0.71371259 0.71371259
Updated Weights 1 1 0 0.706552799 0.706552799
statinfer.com
Thank you
Statinfer.com
statinfer.com
146
Download the course videos and handouts from the below link
https://statinfer.com/course/machine-learning-with-r-2/curriculum/?c=b433a9be3189

More Related Content

What's hot (20)

PPTX
Machine learning with R
Maarten Smeets
 
PDF
Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
BAINIDA
 
PDF
Feature engineering pipelines
Ramesh Sampath
 
PDF
Boosted tree
Zhuyi Xue
 
PPTX
Ml9 introduction to-unsupervised_learning_and_clustering_methods
ankit_ppt
 
PPTX
07 learning
ankit_ppt
 
PDF
Meetup_Consumer_Credit_Default_Vers_2_All
Bernard Ong
 
PDF
Feature Reduction Techniques
Vishal Patel
 
PPTX
Matrix decomposition and_applications_to_nlp
ankit_ppt
 
PPTX
Ml10 dimensionality reduction-and_advanced_topics
ankit_ppt
 
PDF
Matrix Factorization
Yusuke Yamamoto
 
PDF
Gradient Boosted Regression Trees in scikit-learn
DataRobot
 
PPTX
Ppt shuai
Xiang Zhang
 
PDF
GLM & GBM in H2O
Sri Ambati
 
PDF
QBIC
Misha Kozik
 
PPTX
Machine learning Algorithms with a Sagemaker demo
Hridyesh Bisht
 
PPTX
Ml3 logistic regression-and_classification_error_metrics
ankit_ppt
 
PDF
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Gabriel Moreira
 
PPTX
Ml7 bagging
ankit_ppt
 
PPTX
Algorithms Design Patterns
Ashwin Shiv
 
Machine learning with R
Maarten Smeets
 
Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC
BAINIDA
 
Feature engineering pipelines
Ramesh Sampath
 
Boosted tree
Zhuyi Xue
 
Ml9 introduction to-unsupervised_learning_and_clustering_methods
ankit_ppt
 
07 learning
ankit_ppt
 
Meetup_Consumer_Credit_Default_Vers_2_All
Bernard Ong
 
Feature Reduction Techniques
Vishal Patel
 
Matrix decomposition and_applications_to_nlp
ankit_ppt
 
Ml10 dimensionality reduction-and_advanced_topics
ankit_ppt
 
Matrix Factorization
Yusuke Yamamoto
 
Gradient Boosted Regression Trees in scikit-learn
DataRobot
 
Ppt shuai
Xiang Zhang
 
GLM & GBM in H2O
Sri Ambati
 
Machine learning Algorithms with a Sagemaker demo
Hridyesh Bisht
 
Ml3 logistic regression-and_classification_error_metrics
ankit_ppt
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Gabriel Moreira
 
Ml7 bagging
ankit_ppt
 
Algorithms Design Patterns
Ashwin Shiv
 

Similar to Neural Networks made easy (20)

PDF
Artificial Intelligence Course: Linear models
ananth
 
PDF
Day 2 build up your own neural network
HuyPhmNht2
 
PPTX
Introduction to Machine Learning
AI Summary
 
PPTX
Artificial Neural Network
Dessy Amirudin
 
PDF
Lesson_8_DeepLearning.pdf
ssuser7f0b19
 
PDF
Machine learning pt.1: Artificial Neural Networks ® All Rights Reserved
Jonathan Mitchell
 
PDF
Applying Neural Network Models for Binary Classification of EEG Signals.pdf
Patrick Ogbuitepu
 
PPTX
Machine_Learning.pptx
VickyKumar131533
 
PDF
Introduction to Artificial Neural Networks
Stratio
 
PPTX
PREDICT 422 - Module 1.pptx
VikramKumar790542
 
PDF
MLEARN 210 B Autumn 2018: Lecture 1
heinestien
 
PPTX
lecture15-neural-nets (2).pptx
anjithaba
 
PDF
Heuristic design of experiments w meta gradient search
Greg Makowski
 
PPTX
Data Science and Machine Learning with Tensorflow
Shubham Sharma
 
PDF
Introduction to machine learning
Sanghamitra Deb
 
PPTX
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
PDF
super-cheatsheet-artificial-intelligence.pdf
ssuser089265
 
PPTX
Introduction to Machine Learning for Java Developers
Zoran Sevarac, PhD
 
PPTX
5_LR_Apr_7_2021.pptx in nature language processing
attaurahman
 
PPTX
Artificial Neural Networks presentations
migob991
 
Artificial Intelligence Course: Linear models
ananth
 
Day 2 build up your own neural network
HuyPhmNht2
 
Introduction to Machine Learning
AI Summary
 
Artificial Neural Network
Dessy Amirudin
 
Lesson_8_DeepLearning.pdf
ssuser7f0b19
 
Machine learning pt.1: Artificial Neural Networks ® All Rights Reserved
Jonathan Mitchell
 
Applying Neural Network Models for Binary Classification of EEG Signals.pdf
Patrick Ogbuitepu
 
Machine_Learning.pptx
VickyKumar131533
 
Introduction to Artificial Neural Networks
Stratio
 
PREDICT 422 - Module 1.pptx
VikramKumar790542
 
MLEARN 210 B Autumn 2018: Lecture 1
heinestien
 
lecture15-neural-nets (2).pptx
anjithaba
 
Heuristic design of experiments w meta gradient search
Greg Makowski
 
Data Science and Machine Learning with Tensorflow
Shubham Sharma
 
Introduction to machine learning
Sanghamitra Deb
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
super-cheatsheet-artificial-intelligence.pdf
ssuser089265
 
Introduction to Machine Learning for Java Developers
Zoran Sevarac, PhD
 
5_LR_Apr_7_2021.pptx in nature language processing
attaurahman
 
Artificial Neural Networks presentations
migob991
 
Ad

More from Venkata Reddy Konasani (20)

PDF
Transformers 101
Venkata Reddy Konasani
 
PDF
Machine Learning Deep Learning AI and Data Science
Venkata Reddy Konasani
 
PDF
Model selection and cross validation techniques
Venkata Reddy Konasani
 
PPTX
Decision tree
Venkata Reddy Konasani
 
PPTX
Step By Step Guide to Learn R
Venkata Reddy Konasani
 
PPTX
Credit Risk Model Building Steps
Venkata Reddy Konasani
 
PDF
Table of Contents - Practical Business Analytics using SAS
Venkata Reddy Konasani
 
PPTX
SAS basics Step by step learning
Venkata Reddy Konasani
 
PPTX
Testing of hypothesis case study
Venkata Reddy Konasani
 
DOCX
L101 predictive modeling case_study
Venkata Reddy Konasani
 
PPT
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Venkata Reddy Konasani
 
PDF
Machine Learning for Dummies
Venkata Reddy Konasani
 
PDF
Online data sources for analaysis
Venkata Reddy Konasani
 
PDF
A data analyst view of Bigdata
Venkata Reddy Konasani
 
PDF
Cluster Analysis for Dummies
Venkata Reddy Konasani
 
PPTX
Introduction to predictive modeling v1
Venkata Reddy Konasani
 
PDF
Big data Introduction by Mohan
Venkata Reddy Konasani
 
PDF
Data Analyst - Interview Guide
Venkata Reddy Konasani
 
PDF
Model building in credit card and loan approval
Venkata Reddy Konasani
 
Transformers 101
Venkata Reddy Konasani
 
Machine Learning Deep Learning AI and Data Science
Venkata Reddy Konasani
 
Model selection and cross validation techniques
Venkata Reddy Konasani
 
Decision tree
Venkata Reddy Konasani
 
Step By Step Guide to Learn R
Venkata Reddy Konasani
 
Credit Risk Model Building Steps
Venkata Reddy Konasani
 
Table of Contents - Practical Business Analytics using SAS
Venkata Reddy Konasani
 
SAS basics Step by step learning
Venkata Reddy Konasani
 
Testing of hypothesis case study
Venkata Reddy Konasani
 
L101 predictive modeling case_study
Venkata Reddy Konasani
 
Learning Tableau - Data, Graphs, Filters, Dashboards and Advanced features
Venkata Reddy Konasani
 
Machine Learning for Dummies
Venkata Reddy Konasani
 
Online data sources for analaysis
Venkata Reddy Konasani
 
A data analyst view of Bigdata
Venkata Reddy Konasani
 
Cluster Analysis for Dummies
Venkata Reddy Konasani
 
Introduction to predictive modeling v1
Venkata Reddy Konasani
 
Big data Introduction by Mohan
Venkata Reddy Konasani
 
Data Analyst - Interview Guide
Venkata Reddy Konasani
 
Model building in credit card and loan approval
Venkata Reddy Konasani
 
Ad

Recently uploaded (20)

PPTX
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
PPTX
ENGLISH 8 WEEK 3 Q1 - Analyzing the linguistic, historical, andor biographica...
OliverOllet
 
PPTX
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
PPTX
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
PPTX
20250924 Navigating the Future: How to tell the difference between an emergen...
McGuinness Institute
 
PPTX
Rules and Regulations of Madhya Pradesh Library Part-I
SantoshKumarKori2
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PDF
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
PPTX
Basics and rules of probability with real-life uses
ravatkaran694
 
PPTX
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
PPTX
Electrophysiology_of_Heart. Electrophysiology studies in Cardiovascular syste...
Rajshri Ghogare
 
PPT
DRUGS USED IN THERAPY OF SHOCK, Shock Therapy, Treatment or management of shock
Rajshri Ghogare
 
PPTX
Top 10 AI Tools, Like ChatGPT. You Must Learn In 2025
Digilearnings
 
PPTX
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
PPTX
The Future of Artificial Intelligence Opportunities and Risks Ahead
vaghelajayendra784
 
DOCX
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
PPTX
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
PPTX
Virus sequence retrieval from NCBI database
yamunaK13
 
PDF
Tips for Writing the Research Title with Examples
Thelma Villaflores
 
PPTX
Unlock the Power of Cursor AI: MuleSoft Integrations
Veera Pallapu
 
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
ENGLISH 8 WEEK 3 Q1 - Analyzing the linguistic, historical, andor biographica...
OliverOllet
 
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
20250924 Navigating the Future: How to tell the difference between an emergen...
McGuinness Institute
 
Rules and Regulations of Madhya Pradesh Library Part-I
SantoshKumarKori2
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
Module 2: Public Health History [Tutorial Slides]
JonathanHallett4
 
Basics and rules of probability with real-life uses
ravatkaran694
 
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
Electrophysiology_of_Heart. Electrophysiology studies in Cardiovascular syste...
Rajshri Ghogare
 
DRUGS USED IN THERAPY OF SHOCK, Shock Therapy, Treatment or management of shock
Rajshri Ghogare
 
Top 10 AI Tools, Like ChatGPT. You Must Learn In 2025
Digilearnings
 
TOP 10 AI TOOLS YOU MUST LEARN TO SURVIVE IN 2025 AND ABOVE
digilearnings.com
 
The Future of Artificial Intelligence Opportunities and Risks Ahead
vaghelajayendra784
 
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
How to Close Subscription in Odoo 18 - Odoo Slides
Celine George
 
Virus sequence retrieval from NCBI database
yamunaK13
 
Tips for Writing the Research Title with Examples
Thelma Villaflores
 
Unlock the Power of Cursor AI: MuleSoft Integrations
Veera Pallapu
 

Neural Networks made easy

  • 1. Neural Networks in R Venkat Reddy
  • 2. Statinfer.com Data Science Training and R&D statinfer.com 2 Corporate Training Classroom Training Online Training Contact us [email protected] [email protected]
  • 3. Note •This presentation is just my class notes. The course notes for data science training is written by me, as an aid for myself. •The best way to treat this is, as a high-level summary; the actual session went more in depth and contained detailed information and examples •Most of this material was written as informal notes, not intended for publication •Please send questions/comments/corrections to [email protected] •Please check our website statinfer.com for latest version of this document -Venkata Reddy Konasani (Cofounder statinfer.com) statinfer.com 3
  • 5. Contents •Neural network Intuition •Neural network and vocabulary •Neural network algorithm •Math behind neural network algorithm •Building the neural networks •Validating the neural network model •Neural network applications •Image recognition using neural networks 5 statinfer.com
  • 6. Recap of Logistic Regression
  • 7. Recap of Logistic Regression •Categorical output YES/NO type •Using the predictor variables to predict the categorical output 7 statinfer.com
  • 9. Decision Boundary – Logistic Regression •The line or margin that separates the classes •Classification algorithms are all about finding the decision boundaries •It need not be straight line always •The final function of our decision boundary looks like • Y=1 if wTx+w0>0 ; else Y=0 9 x1 x2 statinfer.com
  • 10. Decision Boundary – Logistic Regression •In logistic regression, Decision Boundary can be derived from the logistic regression coefficients and the threshold. • Imagine the logistic regression line p(y)=e(b0+b1x1+b2x2)/1+exp(b0+b1x1+b2x2) • Suppose if p(y)>0.5 then class-1 or else class-0 • log(y/1-y)=b0+b1x1+b2x2 • Log(0.5/0.5)=b0+b1x1+b2x2 • 0=b0+b1x1+b2x2 • b0+b1x1+b2x2=0 is the line 10 statinfer.com
  • 11. Decision Boundary – Logistic Regression •Rewriting it in mx+c form • X2=(-b1/b2)X1+(-b0/b2) •Anything above this line is class-1, below this line is class-0 • X2>(-b1/b2)X1+(-b0/b2)is class-1 • X2<(-b1/b2)X1+(-b0/b2) is class-0 • X2=(-b1/b2)X1+(-b0/b2) tie probability of 0.5 •We can change the decision boundary by changing the threshold value(here 0.5) 11 statinfer.com
  • 12. LAB: Logistic Regression and Decision Boundary
  • 13. LAB: Logistic Regression •Dataset: Emp_Productivity/Emp_Productivity.csv •Filter the data and take a subset from above dataset . Filter condition is Sample_Set<3 •Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes (visualizing the classes) •Build a logistic regression model to predict Productivity using age and experience •Create the confusion matrix •Calculate the accuracy and error rates 13 statinfer.com
  • 14. LAB: Decision Boundary •Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes (visualizing the classes) •Build a logistic regression model to predict Productivity using age and experience •Finally draw the decision boundary for this logistic regression model 14 statinfer.com
  • 21. New representation for logistic regression
  • 22. New representation for logistic regression 22 𝑦 = 𝑒 𝛽0+𝛽1𝑥1+𝛽2𝑥2 1 + 𝑒 𝛽0+𝛽1𝑥1+𝛽2𝑥2 𝑦 = 1 1 + 𝑒−(𝛽0+𝛽1𝑥1+𝛽2𝑥2) x1 x2 w1 w2 w0 yW0+w1x1+w2x 2 𝑦 = 𝑔(σ 𝑤 𝑘 𝑥 𝑘) 𝑦 = 𝑔 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 𝑤ℎ𝑒𝑟𝑒 𝑔 𝑥 = 1 1 + 𝑒−𝑥 statinfer.com
  • 23. Finding the weights in logistic regression 23 x1 x2 W0+w1x1+w2x 2 w1 w2 w0 y 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [𝑦𝑖 − 𝑔 σ 𝑤 𝑘 𝑥 𝑘 ]2 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤 𝑘 𝑥 𝑘) The above output is a non linear function of linear combination of inputs – A typical multiple logistic regression line statinfer.com
  • 25. LAB: Non-Linear Decision Boundaries •Dataset: “Emp_Productivity/ Emp_Productivity_All_Sites.csv” •Draw a scatter plot that shows Age on X axis and Experience on Y-axis. Try to distinguish the two classes with colors or shapes (visualizing the classes) •Build a logistic regression model to predict Productivity using age and experience •Finally draw the decision boundary for this logistic regression model •Create the confusion matrix •Calculate the accuracy and error rates 25 statinfer.com
  • 26. Code: Non-Linear Decision Boundaries 26 statinfer.com
  • 27. Code: Non-Linear Decision Boundaries 27 statinfer.com
  • 28. Code: Non-Linear Decision Boundaries 28 statinfer.com
  • 29. Code: Non-Linear Decision Boundaries 29 statinfer.com
  • 30. Code: Non-Linear Decision Boundaries 30 statinfer.com
  • 33. Non-Linear Decision Boundaries-issues •Logistic Regression line doesn’t seam to be a good option when we have non-linear decision boundaries 33 statinfer.com
  • 35. Intermediate outputs 35 x1 x2 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡1 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤 𝑘 𝑥 𝑘) ,Say h1 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡2 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤 𝑘 𝑥 𝑘) , Say h2 Model2Model1 statinfer.com
  • 36. The Intermediate output •Using the x’s Directly predicting y is challenging. •We can predict h, the intermediate output, which will indeed predict Y 36 x1 x2 w11 w12 y w22 w21 h1 h2 W1 W2 statinfer.com
  • 37. Finding the weights for intermediate outputs 37 𝐹𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡 𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗ℎ𝑗) 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡2 ℎ2 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤2𝑘 𝑥 𝑘) 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤1 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [ℎ1 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤2 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [ℎ2 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡1 ℎ1 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤1𝑘 𝑥 𝑘) 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑊 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [𝑦𝑖 − 𝑔 σ 𝑊𝑗ℎ𝑗𝑖 ]2 x1 x2 h1 = 𝑔(෍ 𝑤1𝑘 𝑥 𝑘) 𝑦 = 𝑔(σ 𝑊𝑗ℎ𝑗) ℎ2 = 𝑔(෍ 𝑤2𝑘 𝑥 𝑘) statinfer.com
  • 39. LAB: Intermediate output •Dataset: Emp_Productivity/ Emp_Productivity_All_Sites.csv •Filter the data and take first 74 observations from above dataset . •Build a logistic regression model to predict Productivity using age and experience •Calculate the prediction probabilities for all the inputs. Store the probabilities in inter1 variable •Filter the data and take observations from row 34 onwards. •Build a logistic regression model to predict Productivity using age and experience •Calculate the prediction probabilities for all the inputs. Store the probabilities in inter2 variable •Build a consolidated model to predict productivity using inter-1 and inter-2 variables •Create the confusion matrix and find the accuracy and error rates for the consolidated model 39 statinfer.com
  • 55. Neural Network intuition 55 𝐹𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡 𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗ℎ𝑗) 𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗 𝑔(σ 𝑤𝑗𝑘 𝑥 𝑘)) ℎ𝑗 = 𝑜𝑢𝑡 𝑥 = 𝑔(σ 𝑤𝑗𝑘 𝑥 𝑘) • So h is a non linear function of linear combination of inputs – A multiple logistic regression line • Y is a non linear function of linear combination of outputs of logistic regressions • Y is a non linear function of linear combination of non linear functions of linear combination of inputs 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑊 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [𝑦𝑖 − 𝑔 σ 𝑊𝑗ℎ𝑗 ]2 𝑊𝑒 𝑓𝑖𝑛𝑑 {𝑊𝑗} & {𝑤𝑗𝑘} 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [𝑦𝑖 − 𝑔(σ 𝑊𝑗 𝑔(σ 𝑤𝑗𝑘 𝑥 𝑘))]2 Neural networks is all about finding the sets of weights {Wj,} and {wjk} using Gradient Descent Method statinfer.com
  • 56. Neural Network intuition 56 𝐹𝑖𝑛𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡 𝑦 = 𝑜𝑢𝑡(ℎ) = 𝑔(σ 𝑊𝑗ℎ𝑗) 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡2 ℎ2 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤2𝑘 𝑥 𝑘) 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤1 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [ℎ1 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑤2 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [ℎ2 𝑖 − 𝑔 σ 𝑤1𝑘 𝑥 𝑘 ]2 𝐼𝑛𝑡𝑒𝑟𝑚𝑒𝑑𝑖𝑎𝑡𝑒 𝑜𝑢𝑡𝑝𝑢𝑡1 ℎ1 = 𝑜𝑢𝑡(𝑥) = 𝑔(σ 𝑤1𝑘 𝑥 𝑘) 𝑊𝑒 𝑓𝑖𝑛𝑑 𝑊 𝑡𝑜 𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒 σ𝑖=1 𝑛 [𝑦𝑖 − 𝑔 σ 𝑊𝑗ℎ𝑗𝑖 ]2 x1 x2 h1 = 𝑔(෍ 𝑤1𝑘 𝑥 𝑘) 𝑦 = 𝑔(σ 𝑊𝑗ℎ𝑗) ℎ2 = 𝑔(෍ 𝑤2𝑘 𝑥 𝑘) statinfer.com
  • 57. The Neural Networks •The neural networks methodology is similar to the intermediate output method explained above. •But we will not manually subset the data to create the different models. •The neural network technique automatically takes care of all the intermediate outputs using hidden layers •It works very well for the data with non-linear decision boundaries •The intermediate output layer in the network is known as hidden layer •In Simple terms, neural networks are multi layer nonlinear regression models. •If we have sufficient number of hidden layers, then we can estimate any complex non-linear function 57 statinfer.com
  • 58. Neural network and vocabulary 58 1 x1 x2 h1 h2 y Hidden Layer Input Output h1= 1 1+𝑒−(𝑤11 +𝑤12 𝑥1 +𝑤22 𝑥2 ) h2= 1 1+𝑒−(𝑤21 +𝑤13 𝑥1 +𝑤23 𝑥2 ) 𝑦 = 1 1 + 𝑒−(𝑊0+𝑊1ℎ1+𝑊2ℎ2) • X1,X2 inputs • 1  bias term • W’s are weights • 1/(1+e-u) is the sigmoid function • Y is output statinfer.com
  • 59. Why are they called hidden layers? •A hidden layer “hides” the desired output. •Instead of predicting the actual output using a single model, build multiple models to predict intermediate output •There is no standard way of deciding the number of hidden layers. 59 statinfer.com
  • 60. The Neural network Algorithm
  • 61. Algorithm for Finding weights 61 x1 x2 h1 𝑦 h2 w11 w21 w12 w13 w22 w23 W2 W1 W3 • Algorithm is all about finding the weights/coefficients • We randomly initialize some weights; Calculate the output by supplying training input; If there is an error the weights are adjusted to reduce this error. statinfer.com
  • 62. The Neural Network Algorithm •Step 1: Initialization of weights: Randomly select some weights •Step 2 : Training & Activation: Input the training values and perform the calculations forward. •Step 3 : Error Calculation: Calculate the error at the outputs. Use the output error to calculate error fractions at each hidden layer •Step 4: Weight training : Update the weights to reduce the error, recalculate and repeat the process of training & updating the weights for all the examples. •Step 5: Stopping criteria: Stop the training and weights updating process when the minimum error criteria is met 62 statinfer.com
  • 63. Randomly initialize weights 63 x1 x2 h1 𝑦 h2 w11 w21 w12 w13 w22 w23 W2 W1 W3 Step 1: Initialization of weights: Randomly select some weights statinfer.com
  • 64. Training & Activation 64 x1 x2 h1 𝑦 h2 w11 w21 w12 w13 w22 w23 W1 W0 W2 h1= 1 1+𝑒−(𝑤11 +𝑤12 𝑥1 +𝑤22 𝑥2 ) h2= 1 1+𝑒−(𝑤21 +𝑤13 𝑥1 +𝑤23 𝑥2 ) 𝑦 = 1 1 + 𝑒−(𝑊0+𝑊1ℎ1+𝑊2ℎ2) Training input & calculations – Feed Forward Step 2 : Input the training values and perform the calculations forward statinfer.com
  • 65. Error Calculation at Output 65 Step 3: Calculate the error at the outputs. Use the output error to calculate error fractions at each hidden layer ෍ 𝑖=1 𝑛 𝑦𝑖 − 𝑔 ෍ 𝑘=1 𝑚 𝑤 𝑘ℎ 𝑘𝑖 2 x1 x2 h1 𝑦 h2 w11 w21 w12 w13 w22 w23 W2 W1 W3 statinfer.com
  • 66. Error Calculation at hidden layers 66 x1 x2 h1 𝑦 h2 Step 3: Calculate the error at the outputs. Use the output error to calculate error fractions at each hidden layer 𝐸𝑟𝑟 = ෍ 𝑖=1 𝑛 𝑦𝑖 − 𝑔 ෍ 𝑘=1 𝑚 𝑤 𝑘ℎ 𝑘𝑖 2 Back Propagation - Calculate Errors signals backwards 𝛿 𝑘 = 𝑦 1 − 𝑦 ∗ 𝑊 ∗ 𝐸𝑟𝑟 statinfer.com
  • 67. Calculate weight corrections 67 x1 x2 h1 𝑦 h2 Dw11 Dw21 Dw12 Dw13 Dw22 Dw23 Step 4: Update the weights to reduce the error, recalculate and repeat the process DW1 DW0 DW2 𝐸𝑟𝑟 = ෍ 𝑖=1 𝑛 𝑦𝑖 − 𝑔 ෍ 𝑘=1 𝑚 𝑤 𝑘ℎ 𝑘𝑖 2 𝛿 𝑘 = 𝑦 1 − 𝑦 ∗ 𝑊 ∗ 𝐸𝑟𝑟 statinfer.com
  • 68. Update Weights 68 x1 x2 h1 𝑦 h2 w11 :=w11+Dw11 w12 :=w12+Dw12 w22 :=w22+Dw22 w13 :=w13+Dw13 w21 :=w21+Dw21 Step 4: Update the weights to reduce the error, recalculate and repeat the process W1:=W1+DW1 W0:=W0+DW0 W2:=W2+DW2 𝐸𝑟𝑟 = ෍ 𝑖=1 𝑛 𝑦𝑖 − 𝑔 ෍ 𝑘=1 𝑚 𝑤 𝑘ℎ 𝑘𝑖 2 𝛿 𝑘 = 𝑦 1 − 𝑦 ∗ 𝑊 ∗ 𝐸𝑟𝑟 statinfer.com
  • 69. Stopping Criteria 69 x1 x2 h1 𝑦 h2 w11 w21 w12 w13 w22 w23 W1 W0 W2 ෍ 𝒊=𝟏 𝒏 𝒚𝒊 − 𝒈 ෍ 𝒌=𝟏 𝒎 𝒘 𝒌 𝒉 𝒌𝒊 𝟐 Step 5: Stop the training and weights updating process when the minimum error criteria is met statinfer.com
  • 70. Once Again ….Neural network Algorithm •Step 1: Initialization of weights: Randomly select some weights •Step 2 : Training & Activation: Input the training values and perform the calculations forward. •Step 3 : Error Calculation: Calculate the error at the outputs. Use the output error to calculate error fractions at each hidden layer •Step 4: Weight training : Update the weights to reduce the error, recalculate and repeat the process of training & updating the weights for all the examples. •Step 5: Stopping criteria: Stop the training and weights updating process when the minimum error criteria is met 70 statinfer.com
  • 72. Neural network Algorithm-Demo Looks like a dataset that can’t be separated by using single linear decision boundary/perceptron 72 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 00 00 0 0 00 0 0 00 00 00 0 0 0 0 0 00 00 00 0 0 00 00 00 00 00 0 0 00 0 0 0 0 0 0 0 0 11 1 0 00 00 0 0 00 0 00 00 00 0 0 0 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 00 00 0 0 00 0 00 00 00 0 0 0 0 0 0 00 00 000 0 0 00 0 0 0 1 00 0 0 00 00 00 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 1 statinfer.com
  • 73. Neural network Algorithm-Demo •Lets consider a similar but simple classification example •XOR Gate Dataset 73 Input1(x1) Input2(x2) Output(y) 1 1 0 1 0 1 0 1 1 0 0 0 0 0 1 1 statinfer.com
  • 74. Randomly initialize weights 74 x1 x2 h1 𝑦 h2 0.5 Step 1: Initialization of weights: Randomly select some weights statinfer.com
  • 75. Activation 75 1 1 0.818 0.7137126 0.731 0.5 h1= 1 1+𝑒−(𝑤11 +𝑤12 𝑥1 +𝑤22 𝑥2 ) h2= 1 1+𝑒−(𝑤21 +𝑤13 𝑥1 +𝑤23 𝑥2 ) 𝑦 = 1 1 + 𝑒−(𝑊0+𝑊1ℎ1+𝑊2ℎ2) input1 input2 output 1 1 0 • In this step we input 1 and 1 as input & expect 0 as output. • With these weights we got an error of - 0.714 at output layer • We need to adjust weights statinfer.com
  • 76. Back-Propagate Errors 76 Ytarget-Yobs -0.7137126 Y(1-Y) 0.20432693 𝛿 at Y -0.1458307 1 1 h1 𝑦 h2 0.5 𝛿 at Y -0.1458307 h2(1-h2) 0.1966119 𝑊 1 𝛿 at h2 -0.0286721 𝛿 at Y -0.1458307 h1(1-h1) 0.1491465 𝑊 -1 𝛿 at h1 0.0217501 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘 𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘 𝜂 is the learning parameter 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟 (for hidden layers𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟) Err=Expected output-Actual output statinfer.com
  • 77. Calculate Weight Corrections 77 1 1 𝛿 at h1 0.0217501 𝛿 at Y −0.1458307 𝛿 at h2 −0.02867 0.5 ∆𝑊𝑗𝑘 = 𝜂. 𝑦𝑗 𝛿 𝑘 ∆𝑊=0.1*1*0.0217501 ∆𝑊=0.00217501 ∆𝑊𝑗𝑘 = 𝜂. 𝑦𝑗 𝛿 𝑘 ∆𝑊=0.1*1*−0.02867 ∆𝑊=-0.002867 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘 𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘 𝜂 is the learning parameter 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟 (for hidden layers𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟) Err=Expected output-Actual output statinfer.com
  • 78. Updated Weights 78 1 1 𝛿 at h1 0.0217501 𝛿 at Y −0.1458307 𝛿 at h2 −0.02867 0.502175 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘 W(new)=W(old)+Correction W(new) =0.502175 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘 W(new)=W(old)+Correction W(new) =-1.002867 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘 𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘 𝜂 is the learning parameter 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟 (for hidden layers𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟) Err=Expected output-Actual output statinfer.com
  • 80. Iterations and Stopping Criteria •This iteration is just for one training example (1,1,0). This is just the first epoch. •We repeat the same process of training and updating of weights for all the data points •We continue and update the weights until we see there is no significant change in the error or when the maximum permissible error criteria is met. •By updating the weights in this method, we reduce the error slightly. When the error reaches the minimum point the iterations will be stopped and the weights will be considered as optimum for this training set 80 statinfer.com
  • 81. XOR Gate final NN Model 81 statinfer.com
  • 83. The good news is.. •We don’t need to write the code for weights calculation and updating •There readymade codes, libraries and packages available in R •The gradient descent method is not very easy to understand for a non mathematics students •Neural network tools don’t expect the user to write the code for the full length back propagation algorithm 83 statinfer.com
  • 84. Building the neural network in R •We have a couple of packages available in R • net • neuralnet •We need to mention the dataset, input, output & number of hidden layers as input. •Neural network calculations are very complex. The algorithm may take sometime to produce the results •One need to be careful while setting the parameters. The runtime changed based on the input parameter values 84 statinfer.com
  • 85. LAB: Building the neural network in R
  • 86. LAB: Building the neural network in R •Build a neural network for XOR data 86 statinfer.com
  • 87. Code: Building the neural network in R 87 statinfer.com
  • 88. R Code Options • neuralnet(Productivity~Age+Experience,data=Emp_Productivity_raw, hidden=2, stepmax = 1e+07, threshold=0.00001, linear.output = FALSE) •The number of hidden layers in the neural network. It is actually the number of nodes. We can input a vector to add more hidden layers •Stepmax: • The number of iterations while executing algorithm. • Sometimes we may need more than 100,000 steps for the algorithm to converge. • Some times we may get an error “Alogorithm didn't converge with the default step max”; We need to increase the stepmax parameter value in such cases. • Additional info • One epoch one complete run of training data. If epoch=500 then algorithm sees the entire data set 500 times • One iteration is the number of times a "batch" of data passed through the algorithm(Steps). If batch size is same as full training data then iterations is equal to epochs 88 statinfer.com
  • 89. R Code Options •Threshold • Connected to weights optimization on error function • By default, neuralnet requires the model partial derivative error to change at least 0.01 otherwise it will stop changing. • It can be used as a stopping criteria. If the partial derivative of error function reaches this threshold then the algorithm will stop. • A lower threshold value will force the algorithm for more iterations and accuracy. •The output is expected to be linear by default. We need to specifically mention linear.output = FALSE for classification problems 89 statinfer.com
  • 90. Code: Building the neural network in R 90 Execute a couple of times to get zero error statinfer.com
  • 91. Code: Building the neural network in R 91 statinfer.com
  • 92. Code: Building the neural network in R 92 statinfer.com
  • 93. Code: Building the neural network in R 93 statinfer.com
  • 94. Lab: Building Neural network on Employee productivity data •Dataset: Emp_Productivity/Emp_Productivity.csv •Draw a 2D graph between age, experience and productivity •Build neural network algorithm to predict the productivity based on age and experience •Plot the neural network with final weights •Increase the hidden layers and see the change in accuracy 94 statinfer.com
  • 95. Code: Neural network on Employee productivity data 95 statinfer.com
  • 96. Code: Neural network on Employee productivity data 96 statinfer.com
  • 97. Code: Neural network on Employee productivity data 97 statinfer.com
  • 98. Code: Neural network on Employee productivity data 98 statinfer.com
  • 99. Code: Neural network on Employee productivity data 99 statinfer.com
  • 100. Code: Neural network on Employee productivity data 100 statinfer.com
  • 101. Code: Neural network on Employee productivity data 101 statinfer.com
  • 102. There can be many solutions 102 Set-1 statinfer.com
  • 103. There can be many solutions 103 Set-2 statinfer.com
  • 104. There can be many solutions 104 Set-3 statinfer.com
  • 105. Local vs. Global Minimum
  • 106. Local vs. Global Minimum • The neural network might give different results with different start weights. • The algorithm tries to find the local minima rather than global minima. • There can be many local minima’s, which means there can be many solutions to neural network problem • We need to perform the validation checks before choosing the final model. 106Global minimu m Local Minimu m statinfer.com
  • 107. Hidden layers and their role
  • 108. Multi Layer Neural Network 108 1 x1 x2 H11 H12 Y Hidden LayersInput Output H21 H22 H22 statinfer.com
  • 109. The role of hidden layers 109 • The First hidden layer • The first layer is nothing but the liner decision boundaries • The simple logistic regression line outputs • We can see them as multiple lines on the decision space 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 00 00 0 0 00 0 0 00 00 00 0 0 0 0 0 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 0 0 0 0 0 0 11 1 0 00 00 0 0 00 0 00 00 00 0 0 0 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 00 00 0 0 00 0 00 00 00 0 0 0 0 0 0 00 00 000 0 0 00 0 0 0 1 00 0 0 00 00 00 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 1 statinfer.com
  • 110. The role of hidden layers 110 • The Second hidden layer • The Second layer combines these lines and forms simple decision boundary shapes • The third hidden layer forms even complex shapes within the boundaries generated by second layer. • You can imagine All these layers together divide the whole objective space into multiple decision boundary shapes, the cases within the shape are class-1 outside the shape are class-2 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 00 00 0 0 00 0 0 00 00 00 0 0 0 0 0 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 0 0 0 0 0 0 11 1 0 00 00 0 0 00 0 00 00 00 0 0 0 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 00 00 0 0 00 0 00 00 00 0 0 0 0 0 0 00 00 000 0 0 00 0 0 0 1 00 0 0 00 00 00 0 0 00 00 0 0 00 00 00 00 00 0 0 00 0 0 1 statinfer.com
  • 111. The Number of hidden layers
  • 112. The Number of hidden layers •There is no concrete rule to choose the right number. We need to choose by trail and error validation •Too few hidden layers might result in imperfect models. The error rate will be high •High number of hidden layers might lead to over‐fitting, but it can be identified by using some validation techniques •The final number is based on the number of predictor variables, training data size and the complexity in the target. •When we are in doubt, its better to go with many hidden nodes than few. It will ensure higher accuracy. The training process will be slower though •Cross validation and testing error can help us in determining the model with optimal hidden layers 112 statinfer.com
  • 114. LAB: Digit Recognizer • Take an image of a handwritten single digit, and determine what that digit is. • Normalized handwritten digits, automatically scanned from envelopes by the U.S. Postal Service. The original scanned digits are binary and of different sizes and orientations; the images here have been de slanted and size normalized, resultingin 16 x 16 grayscale images (Le Cun et al., 1990). • The data are in two gzipped files, and each line consists of the digitid (0-9) followed by the 256 grayscale values. • Build a neural network model that can be used as the digit recognizer • Use the test dataset to validate the true classification power of the model • What is the final accuracy of the model? 114 statinfer.com
  • 115. Code: Digit Recognizer #Importing test and training data - USPS Data digits_train <- read.table("D:Google DriveTrainingDatasetsDigit RecognizerUSPSzip.train.txt", quote=""", comment.char="") digits_test <- read.table("D:Google DriveTrainingDatasetsDigit RecognizerUSPSzip.test.txt", quote=""", comment.char="") dim(digits_train) col_names <- names(digits_train[,-1]) label_levels<-names(table(digits_train$V1)) #Lets see some images. for(i in 1:10) { data_row<-digits_train[i,-1] pixels = matrix(as.numeric(data_row),16,16,byrow=TRUE) image(pixels, axes = FALSE) title(main = paste("Label is" , digits_train[i,1]), font.main = 4) } 115 statinfer.com
  • 117. Code: Digit Recognizer #####Creating multiple columns for multiple outputs #####We need these variables while building the model digit_labels<-data.frame(label=digits_train[,1]) for (i in 1:10) { digit_labels<-cbind(digit_labels, digit_labels$label==i-1) names(digit_labels)[i+1]<-paste("l",i-1,sep="") } label_names<-names(digit_labels[,-1]) #Update the training dataset digits_train1<-cbind(digits_train,digit_labels) names(digits_train1) #formula y~. doesn't work in neuralnet function model_form <- as.formula(paste(paste(label_names, collapse = " + "), "~", paste(col_names, collapse = " + "))) 117 statinfer.com
  • 124. Real-world applications •Self driving car by taking the video as input •Speech recognition •Face recognition •Cancer cell analysis •Heart attack predictions •Currency predictions and stock price predictions •Credit card default and loan predictions •Marketing and advertising by predicting the response probability •Weather forecasting and rainfall prediction 124 statinfer.com
  • 125. Real-world applications •Face recognition : • https://www.youtube.com/watch?v=57VkfXqJ1LU • https://www.youtube.com/watch?v=xVQLBbXdVUY •Autonomous car software • https://www.youtube.com/watch?v=gG72-SjwxAM 125 statinfer.com
  • 126. Drawbacks of Neural Networks
  • 127. Drawbacks of Neural Networks •No real theory that explains how to choose the number of hidden layers •Takes lot of time when the input data is large, needs powerful computing machines •Difficult to interpret the results. Very hard to interpret and measure the impact of individual predictors •Its not easy to choose the right training sample size and learning rate. •The local minimum issue. The gradient descent algorithm produces the optimal weights for the local minimum, the global minimum of the error function is not guaranteed 127 statinfer.com
  • 128. Why the name neural network?
  • 129. Why the name neural network? •The neural network algorithm for solving complex learning problems is inspired by human brain •Our brains are a huge network of processing elements. It contains a network of billions of neurons. •In our brain, a neuron receives input from other neurons. Inputs are combined and send to next neuron •The artificial neural network algorithm is built on the same logic. 129 statinfer.com
  • 130. Why the name neural network? 130 Dendrites Input(X) Cell body  Processor(Swx) Axon  Output(Y) statinfer.com
  • 132. Conclusion •Neural network is a vast subject. Many data scientists solely focus on only Neural network techniques •In this session we practiced the introductory concepts only. Neural Networks has much more advanced techniques. There are many algorithms other than back propagation. •Neural networks particularly work well on some particular class of problems like image recognition. •The neural networks algorithms are very calculation intensive. They require highly efficient computing machines. Large datasets take significant amount of runtime on R. We need to try different types of options and packages. •Currently there is a lot of exciting research is going on, around neural networks. •After gaining sufficient knowledge in this basic session, you may want to explore reinforced learning, deep learning etc., 132 statinfer.com
  • 134. Math- How to update the weights?
  • 135. Math- How to update the weights? •We update the weights backwards by iteratively calculating the error •The formula for weights updating is done using gradient descent method or delta rule also known as Widrow-Hoff rule •First we calculate the weight corrections for the output layer then we take care of hidden layers 135 statinfer.com
  • 136. Math- How to update the weights? • 𝑊𝑗𝑘 ∶= 𝑊𝑗𝑘 + ∆𝑊𝑗𝑘 • 𝑤ℎ𝑒𝑟𝑒 ∆𝑊𝑗𝑘 = 𝜂. 𝑦j 𝛿 𝑘 • 𝜂 is the learning parameter • 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝐸𝑟𝑟 (for hidden layers 𝛿 𝑘 = 𝑦 𝑘(1 − 𝑦 𝑘) ∗ 𝑤𝑗 ∗ 𝐸𝑟𝑟) • Err=Expected output-Actual output •The weight corrections is calculated based on the error function •The new weights are chosen in such way that the final error in that network is minimized 136 statinfer.com
  • 137. Math-How does the delta rule work?
  • 138. How does the delta rule work? • Lets consider a simple example to understand the weight updating using delta rule. 138 • If we building a simple logistic regression line. We would like to find the weights using weight update rule • Y=1/(1+e-wx) is the equation • We are searching for the optimal w for our data • Let w be 1 • Y=1/(1+e-x) is the initial equation • The error in our initial step is 3.59 • To reduce the error we will add a delta to w and make it 1.5 • Now w is 1.5 (blue line) • Y=1/(1+e-1.5x) the updated equation • With the updated weight, the error is 1.57 • We can further reduce the error by increasing w by delta statinfer.com
  • 139. How does the delta rule work? 139 • If we repeat the same process of adding delta and updating weights, we can finally end up with minimum error • The weight at that final step is the optimal weight • In this example the weight is 8, and the error is 0 • Y=1/(1+e-8x) is the final equation • In this example, we manually changed the weights to reduce the error. This is just for intuition, manual updating is not feasible for complex optimization problems. • In gradient descent is a scientific optimization method. We update the weights by calculating gradient of the function. statinfer.com
  • 141. How does gradient descent work? •Gradient descent is one of the famous ways to calculate the local minimum •By Changing the weights we are moving towards the minimum value of the error function. The weights are changed by taking steps in the negative direction of the function gradient(derivative). 141 Error Weight statinfer.com
  • 143. Does this method really work? • We changed the weights did it reduce the overall error? • Lets calculate the error with new weights and see the change 143 1 1 0.818545647 0.706552799 0.729364041 0.50218 statinfer.com
  • 144. Gradient Descent method validation •With our initial set of weights the overall error was 0.7137,Y Actual is 0, Y Predicted is 0.7137 error =0.7137 •The new weights give us a predicted value of 0.70655 •In one iteration, we reduced the error from 0.7137 to 0.70655 •The error is reduced by 1%. Repeat the same process with multiple epochs and training examples, we can reduce the error further. 144 input1 input2 Output(Y-Actual) Y Predicted Error Old Weights 1 1 0 0.71371259 0.71371259 Updated Weights 1 1 0 0.706552799 0.706552799 statinfer.com
  • 146. Statinfer.com statinfer.com 146 Download the course videos and handouts from the below link https://statinfer.com/course/machine-learning-with-r-2/curriculum/?c=b433a9be3189