15. Linearly Separable â AND gate
15
ď Two input sources =>Two input neurons
ď One output => One output Neuron.
ď Activation function is binary sigmoidal
ď Derivative
17. Back-propagation training/algorithm
17
Given: Input vector i th
instant ,Target .
Initialize weights w0, w1, w2 and learning rate with some random
values in the range [0 1]
1. Output
2. Activation function sigmoidal activation function
3. Compute error:
4. Backpropagate the error to crossing activation function
where is the derivative of activation function selected.
for sigmoidal activation function
18. Back-propagation training/algorithm
18
5. Compute change in weights and bias
, ,
6. Update the changes in weights and bias
7. Keep repeating the steps 1 â 6, for all input combinations ( 4
nos).This is one epoch.
8. Run multiple Epochs till the error decreases and stabilizes.
19. Matrix Notation â AND gate
19
Forward pass
After activation function
Loss function
20. Matrix Notation â AND gate
20
ď Backpropagation
ď UpdateWeights:
ď Update using
This iterative process continues until convergence
21. (4 Rules)Backpropagating Error
21
Y
đŚ
ď f(.)
xi
đŚ
ď f(.)
wi
1. Output Neuron
2.Across Link
đđđĄ đ đĄđđđđĽđ
đđđĄ đ đĄđđđ đŚ
đđđĄ đ đĄđđđ đŚ
3.Weights Update
28. Back-propagation Training
28
Given: Inputs , target .
Initialize weights and learning rate with some random
values
1. Hidden unit , j = 1 to p hidden neurons
2. output , sigmoidal activation function
3. Output unit
4. Output sigmoidal activation function
Feed-forward
Phase
29. Back-propagation Training
29
5. Compute error correction term
where is derivative
6. Compute change in weights and bias ,
send to previous layer
7. Hidden unit
8. Calculate error term
9. Compute change in weights and bias
,
Back-propagation
of
error
Phase
30. Back-propagation Training
30
10. Each output unit, k = 1 to m update weights and bias
11. Each hidden unit, j = 1 to p update weights and bias
12. Check for stopping criterion e.g. certain number of
epochs or when targets are equal/close to network
outputs
Weights
and
Bias
update
phase
49. Regularization (to avoid Overfitting)
49
One of the primary causes of corruption of the generalization
process is overfitting.
The objective is to determine a curve that defines the border of
the two groups using the training data.
50. Overfitting
50
One of the primary causes of corruption of the generalization
process is overfitting.
The objective is to determine a curve that defines the border of
the two groups using the training data.
51. Overfitting
51
Some outliers penetrate the area of the other group and disturb
the boundary. As Machine Learning considers all the data, even
the noise, it ends up producing an improper model (a curve in
this case).This would be penny-wise and pound-foolish.
52. Remedy : Regularization
52
ď Regularization is a numerical method that attempts to
construct a model structure as simple as possible. The
simplified model can avoid the effects of overfitting at the
small cost of performance.
ď Cost function Sum of squared errors
53. Remedy : Regularization
53
ď For this reason, overfitting of the neural network can be
improved by adding the sum of weights to the cost function,
(new) Cost function
ď In order to drop the value of the cost function, both the error
and weight should be controlled to be as small as possible.
ď However, if a weight becomes small enough, the associated
nodes will be practically disconnected. As a result,
unnecessary connections are eliminated, and the neural
network becomes simpler.
54. Add L1 Regularization to XOR Network
54
ď New Loss function
ď The gradient of the regularized loss w.r.t a weight w is:
Update rule for weights w is
55. Add L2 Regularization to XOR Network
55
ď New Loss function
ď The gradient of the regularized loss w.r.t a weight w is:
Update rule for weights w is
58. Matrix Notation â XOR gate with L2
Regularization
58
Expanding from Slide 44.
Only weights will be updated as follows. Bias values wonât change.
This iterative process continues until convergence.
L2 regularization penalizes large weights, resulting in slightly
smaller weight updates compared to the non-regularized case.
66. Appendix: Example Implementation
66
Using Back-
propagation
network, find the
new weights for the
network shown
aside. Input = [0 1]
and target output is
1. use learning rate
0.25 and binary
sigmoidal activation
function
67. 1. Consolidate the information
67
ď Given: Inputs [0 1], target 1.
ď [
ď []=[0.4 0.1 0.2]
ď Learning rate
ď Activation function is binary sigmoidal
ď Derivative
68. 2. Feed-forward Phase
68
1. Hidden unit , j = 1,2
2. Output , sigmoidal activation function
3. Output unit
4. Output sigmoidal activation function
69. 2. Feed-forward Phase
69
1. Hidden unit , j = 1,2
2. Output , sigmoidal activation function ,
3. Output unit
4. Output sigmoidal activation function
70. 3. Back-propagation of error Phase
70
5. Compute error correction term
6. Compute change in weights and bias
,
,
,
71. 3. Back-propagation of error Phase
71
5. Compute error correction term
6. Compute change in weights and bias
,
,
,
7. Hidden unit
72. 3. Back-propagation of error Phase
72
5. Compute error correction term
6. Compute change in weights and bias
,
,
,
7. Hidden unit
75. 3. Back-propagation of error Phase
75
8. Calculate error term
9. Compute change in weights and bias
,
76. 3. Back-propagation of error Phase
76
8. Calculate error term
9. Compute change in weights and bias
,
0.0118
0.0118
77. 3. Back-propagation of error Phase
77
8. Calculate error term
9. Compute change in weights and bias
,
0.0118
0.0118
0.0
0.00245
78. 3. Back-propagation of error Phase
78
8. Calculate error term
9. Compute change in weights and bias
,
0.0118
0.0118
0.0
0.00245
79. 4. Weights and Bias update phase
79
10. Each output unit, k = 1 to m update weights and bias
,
80. 4. Weights and Bias update phase
80
10. Each output unit, k = 1 to m update weights and bias
,
81. 4. Weights and Bias update phase
81
11. Each hidden unit, j = 1 to p update weights and bias
82. 4. Weights and Bias update phase
82
11. Each hidden unit, j = 1 to p update weights and bias
83. 4. Weights and Bias update phase
83
11. Each hidden unit, j = 1 to p update weights and bias
84. 84
Epoch v 11 v21 v01 v12 v22 v02
0 0.6 -0.1 0.3 -0.3 0.4 0.5
1 0.6 -0.097 0.303 -0.3 0.401 0.501
Write a program for this case and cross-verify your answers.
After how many epochs will the output converge?
Epoch z1 z2 w1 w2 w0 y
0 0.549 0.711 0.4 0.1 -0.2 0.523
1 0.5513 0.7113 0.416 0.121 -0.17 0.5363