3. Guideline
• What is Neuron - Perceptron/Sigmoid
• Neural Network Algorithm and Related Concept
• Feed Forward Propagation
• Cost Function
• Gradient Decent
• A Example for Neural Network Calculation
• Back Propagation
5. Artificial Neural Network
What are the neurons?
How are they connected
and communicated with?
#模擬大腦決策過程
#真的神經元長得超複雜
#神經元能感知環境的變化 - input
#再將信息傳遞給其他的神經元 - propagate
#並指令集體做出反應 - output
13. Sigmoid Neuron
• Small change in weights and biases of perceptron in the
network can cause the output to completely FLIP
• 0 -> 1, 1 -> 0 (錯的分對了,但對的分錯了)
• Sigmoid neuron can make output change with slightly
change in weight/bias
#我又對了我又錯了打我啊笨蛋
#不隨波逐流
17. Design a Network:
Recognition of Handwritten Digits
• Input layers contains neurons encoding the value of the
input pixels
• Value in greyscale: 0.0 -> White, 1.0 -> Black
• N neurons in hidden layersOutput layer contains 10
neurons which output value 0~1
• Check activation value of each neuron
• i.e. The first neuron fires indicating the digit is “0” if
output ≈1
#明明可以用4個output輸出0/1表示2進位數字,為何不用?
#4個output學習效率沒有10個output好
19. Neural Network Algorithm
1. Initializing parameters including weights and biases randomly
2. Computing output of each layer using Feed Forward Propagation
• The output from one layer is used as input to next layer
• No loops, information is always fed forward, never fed back
• Some other type of network model is not feed forward: Recurrent Neural
Network (RNN)
3. Using Error Back Propagation (BP) to learn the best weights and biases with
minimized cost
1. C(w,b)≈0
2. Using Gradient Decent to update weights and biases
#我們希望透過些微的權重調整來慢慢改變輸出
#透過不斷地調整來逐漸建立想要的神經網路
20. Cost Function and Gradient Decent
(Δv1, Δv2)
用向量表示變成
C(v1, v2)
V1 是 x 的向量 V2 是 a 的向量
#Gradient Decent就是一個醉漢下山的過程
gradient的向量矩陣表示法
梯度(gradient)就是向量v的改變對C的改變
永遠保證梯度向量是負的
28. Step2: Back Propagation
2.1: Update Weight: Hidden layer to Output layer
#計算output1權重和對於output1 activation的導數
#Sigmoid特殊求導數的方式
#不要問我為什麼
29. Step2: Back Propagation
2.1: Update Weight: Hidden layer to Output layer
#計算w5對於output1權重和的導數
#最後三者相乘可以得到“權重w5對於整體誤差的導數”
30. Step2: Back Propagation
2.1: Update Weight: Hidden layer to Output layer
Learning Rate
#更新w5權重值
#這樣就完成了一層的權重更新....
31. Step2: Back Propagation
2.2: Update Weight: Hidden layer to Hidden Layer
#這樣就完成了Back Propagation
#不斷迭代這個過程會發現誤差下降
32. Some Questions
• Why use cost function like that? Quadratic cost (一元二次)
• Why not maximize the output directly rather than
minimizing a proxy measure like quadratic cost?
#f(wx+b) = y並非smooth function
#意思是大部份時候更改weight/bias並不會改變結果
#這樣很難知道微調對於表現是否增加
#藉由最小化smooth的cost function
Δv1
Δv2