Tutorial on convolutional neural networks

Tutorial on
Convolutional Neural Networks
SKKU Data Mining LAB
Hojin Yang

Index
Intro
Convolution Layer Basics
CNN Backprop Implementation
CNN in Practice
CNN Architecture
Most of materials are built upon previous effort of CS231n
http://cs231n.stanford.edu/2017/

Intro
Neural
Net
Left turn
Right turn
No Left turn
No Right turn
U-turn
No U-turn
input
Pixels(28 by 28)
Values from -1 to 1
How would you like to model Neural net?

Intro
Pixels(28 by 28)
784 by 3
=2,352params
Fully Connected Neural Net
3 by 5
P(Left turn)
P(Right turn)
P(No Left turn)
P(U-turn)
P(No U-turn)
P(No Right turn)
…
Cannot preserve spatial structure
Need tons of parameters
784by1

Intro
Pixels(28 by 28)
Convolutional NN
P(Left turn)
P(Right turn)
P(No Left turn)
P(U-turn)
P(No U-turn)
P(No Right turn)
Three
7 X 7 filters
preserve spatial structure
Can reduce the # of params
3x(7x7)
=147 params
Created image
Element-wise
product

Intro
Pixels(28 by 28)
Use Filters!
7 by 7 filters

Intro
Pixels(28 by 28)
Use Filters!
P(Left turn)
P(Right turn)
P(No Left turn)
P(U-turn)
P(No U-turn)
P(No Right turn)
7 by 7 filters

Intro
There is a diagonal line Right arrow signal
on the upper right
Vertical line
on the mid-left
Umm..

Intro
There is a diagonal line Right arrow signal
on the upper right
Vertical line
on the mid-left Must be
‘No turn right’ !

Convolution Layer
Materials from CS231n : Lecture 5. Convolution Neural Net

Convolution Layer
1 0 1
0 1 1
1 0 1 Filter
http://deeplearning.stanford.edu/wiki/index.php/Feature_extraction_using_convolution
https://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/

Convolution Layer
In convolutional neural networks (CNN) every network layer acts as a
detection filter for the presence of specific features or patterns present in
the original data.
The first layers(filters) in a CNN detect features that can be recognized and
interpreted relatively easy.
Later layers(filters) detect increasingly features that are more abstract.

Convolution Layer
Test: Interpret first layers’ Filters
Five
8X8 Filters
FNCC Softmax
Data Set: mnist
28x28x1

Convolution Layer
Filters Initialize

Convolution Layer
Epoch: 200
Epoch: 600
Filters Initialize

Convolution Layer
Filters Initialize
Epoch: 200
Epoch: 600
Epoch: 5400
Epoch: 19800
Epoch: 1000
Epoch: 19800
Test accuracy: 0.9778

Convolution Layer
http://scs.ryerson.ca/~aharley/vis/conv/flat.html

Convolution Layer
paddingMax pool
Size of next layer = (L-C+2P)/stride + 1

CNN Backprop Implementation
- Computational Graphs
- Backprop using for-loop
- im2col

Computational Graphs
A 𝜎 A soft
max
Loss
func
𝑥 1
𝑊 1
𝑏 1
𝑊 2
𝑏 2
𝑥 2 𝑙𝑜𝑔𝑖𝑡𝑠
loss
𝑠𝑐𝑎𝑙𝑎𝑟
𝑙𝑎𝑏𝑒𝑙𝑠
𝑦𝑎 1
xW + 𝑏

A 𝜎 A soft
max
Loss
func
𝑥 1
𝑊 1
𝑏 1
𝑊 2
𝑏 2
1
𝑦
𝜕𝐿/𝜕𝑦
loss
𝑎 1

A 𝜎 A soft
max
Loss
func
𝑥 1
𝑊 1
𝑏 1
𝑊 2
𝑏 2
1
𝑦
𝜕𝐿/𝜕𝑦
loss
𝜕𝐿/𝜕𝑙𝑜𝑔𝑖𝑡𝑠
𝜕𝐿/𝜕𝑏 2
𝜕𝐿/𝜕𝑊 2
𝜕𝐿/𝜕𝑥 2
𝑎 1

A 𝜎 A soft
max
Loss
func
𝑥 1
𝑊 1
𝑏 1
𝑊 2
𝑏 2
1
𝑦
𝜕𝐿/𝜕𝑦
loss
𝜕𝐿/𝜕𝑙𝑜𝑔𝑖𝑡𝑠
𝜕𝐿/𝜕𝑏 2
𝜕𝐿/𝜕𝑊 2
𝜕𝐿/𝜕𝑥 2
𝑎 1
𝜕𝐿/𝜕𝑎 2
𝜕𝐿/𝜕𝑏 1
𝜕𝐿/𝜕𝑊 1

class Affine:
def __init__(self,W,b):
self.W = W
self.b = b
self.x = None
self.dW = None
self.db = None
def forward(self,x):
self.x = x
out = np.dot(W,x) + self.b
return out
def backward(self,dout):
self.db = np.sum(dout,axis=0)
self.dW = np.dot(self.x.T,dout)
dx = np.dot(dout,self.W.T)
return dx
A
𝑊 1
𝑏 1

class Affine:
self.W = W
self.b = b
self.x = None
self.dW = None
self.db = None
self.x = x
return out
return dx
A
𝒙 𝟏
𝑊 1
𝑏 1
𝒂 𝟏
𝑖𝑛𝑝𝑢𝑡:
𝑜𝑢𝑡𝑝𝑢𝑡:
xW + 𝑏

class Affine:
self.W = W
self.b = b
self.x = None
self.dW = None
self.db = None
self.x = x
return out
return dx
A
𝑊 1
𝑏 1
𝝏𝑳/𝝏𝒂
𝝏𝑳/𝝏𝒙
𝝏𝑳/𝝏𝒃
𝝏𝑳/𝝏𝑾
𝒅𝒐𝒖𝒕
𝒙 𝟏

class Affine:
self.W = W
self.b = b
self.x = None
self.dW = None
self.db = None
self.x = x
return out
return dx
A
𝑊 1
𝑏 1
𝝏𝑳/𝝏𝒂
𝝏𝑳/𝝏𝒙
𝝏𝑳/𝝏𝒃
𝝏𝑳/𝝏𝑾
𝒅𝒐𝒖𝒕
+ + +
𝒙 𝟏
xW + 𝑏

class Affine:
self.W = W
self.b = b
self.x = None
self.dW = None
self.db = None
self.x = x
return out
return dx
A
𝑊 1
𝑏 1
𝝏𝑳/𝝏𝒂
𝝏𝑳/𝝏𝒃
𝝏𝑳/𝝏𝑾
𝒅𝒐𝒖𝒕
𝐱 𝐭
× 𝒅𝒐𝒖𝒕
𝝏𝑳/𝝏𝒙
𝒙 𝟏
𝒅𝒐𝒖𝒕 × 𝐰 𝐭

Backprop Using for-Loop
conv 𝜎
𝑥 1
𝑊 1
𝑥 2𝑎 1
Max
pooling
𝑥 2
conv
𝑊 2
𝜎
𝑥 2
Max
pooling

class Conv:
def __init__(self,W):
self.W = W
self.x = None
self.dW = None
conv
𝑊 1

conv
𝑥 1
𝑊 1
𝑎 1
self.x = x
#input과 filter의 column(=row) 크기 계산
size_x = self.x.shape[0]
size_W = self.W.shape[0]
#convolution 크기 계산후 생성 (x-w)/stride + 1
size_c = size_x - size_W + 1
conv_result = np.zeros(shape=(size_c,size_c))
#합성곱 연산
for i in range(size_c) :
for j in range(size_c):
partial = x[i:i+size_W , j:j+size_W]
conv_result[i][j] = partial * size_W
return conv_result

X
𝑎
𝑐
𝑏
X
𝑎
𝑑𝐿
𝑑𝑐
𝑏
Before CNN Backprop..

X
𝑎
𝑐
𝑏
X
𝑎
𝑑𝐿
𝑑𝑐
𝑏
𝑑𝐿
𝑑𝑐
∙ 𝑏
𝑑𝐿
𝑑𝑐
∙ 𝑎

𝑎(𝑚𝑎𝑡𝑟𝑖𝑥)
𝑐(scalar)
𝑏(𝑚𝑎𝑡𝑟𝑖𝑥)
conv
𝑎
𝑑𝐿
𝑑𝑐
𝑏
conv
𝑎1 𝑏1 + 𝑎2 𝑏2 + 𝑎3 𝑏3 + ⋯ 𝑎 𝑛 𝑏 𝑛 = 𝑐

𝑐(scalar)
conv
𝑎
𝑑𝐿
𝑑𝑐
𝑏
𝑑𝐿
𝑑𝑐
∙ 𝑏
𝑑𝐿
𝑑𝑐
∙ 𝑎
conv
𝑎(𝑚𝑎𝑡𝑟𝑖𝑥)
𝑏(𝑚𝑎𝑡𝑟𝑖𝑥)
𝑎1 𝑏1 + 𝑎2 𝑏2 + 𝑎3 𝑏3 + ⋯ 𝑎 𝑛 𝑏 𝑛 = 𝑐

conv
𝑥 1
𝑊 1
𝑎 1
def backward(self, dout):
#input과 filter,x의 column(=row) 크기 계산
size_x = self.x.shape[0] #column(=row) 크기
size_dout = dout.shape[0]
#filter의 미분 결과 matrix
self.dW = np.zeros(shape=(size_W,size_W))
#x의 미분 결과 matrix
dx = np.zeros(shape=(size_x,size_x))
for i in range(size_dout) :
for j in range(size_dout) :
partial = self.x[i:i+size_W, j:j+size_W]
self.dW += dout[i][j] * partial
dx[i:i+size_W,j:j+size+W] += dout[i][j] * self.W
return dx

im2col
There are highly optimized matrix multiplication routines for just about every platform(ex. Numpy)
-> turn convolution into matrix multiplication!

im2col
Materials from CS231n Winter : Lecture 11

im2col def im2col(input_data, filter_h, filter_w, stride=1, pad=0):
"""다수의 이미지를 입력받아 2차원 배열로 변환한다(평탄화).
Parameters
----------
input_data : 4차원 배열 형태의 입력 데이터(이미지 수, 채널 수, 높이, 너비)
filter_h : 필터의 높이
filter_w : 필터의 너비
stride : 스트라이드
pad : 패딩
Returns
-------
col : 2차원 배열
"""
N, C, H, W = input_data.shape
out_h = (H + 2*pad - filter_h)//stride + 1
out_w = (W + 2*pad - filter_w)//stride + 1
img = np.pad(input_data, [(0,0), (0,0), (pad, pad), (pad, pad)], 'constant')
col = np.zeros((N, C, filter_h, filter_w, out_h, out_w))
for y in range(filter_h):
y_max = y + stride*out_h
for x in range(filter_w):
x_max = x + stride*out_w
col[:, :, y, x, :, :] = img[:, :, y:y_max:stride, x:x_max:stride]
col = col.transpose(0, 4, 5, 1, 2, 3).reshape(N*out_h*out_w, -1)
return col
https://github.com/WegraLee/deep-learning-from-scratch/blob/master/common/util.py

10
Simple 2 by 2 Max-pooling
Max
Pool
10 9
8 7
𝑦′Max
Pool
𝑦′ 0
0 0

CNN in Practice
Effective Receptive Field
Bottleneck

input filter1 filter2

input Result 1
Receptive field :
3 by 3

input Result 1
Receptive field :
3 by 3 Result 2
Receptive field :
3 by 3

1
9
input Result 1
1 2 3
4 5 6
7 8 9
Result 2
Effective Receptive field :
5 by 5

ERF : 5 X 5
ERF : 5 X 5
ERF : 5 X 5
ERF : 7 X 7
ERF : 7 X 7

ERF : 5 X 5
ERF : 5 X 5
ERF : 5 X 5
ERF : 7 X 7
ERF : 7 X 7
Fewer Parameters
Less compute
More nonlinearity
GOOD!

H
W
C
1
1
C
…
C/2
Bottleneck

H
W
C
1
1
C
H
W
C/2
…
C/2
=
Bottleneck

1
1
C/2
H
W
C/2
…
C
Bottleneck

H
W
C
1
1
C/2
H
W
C/2
…
C
=
Bottleneck

H
W
C
…
H
W
C
# of filter=c
3
3 C
…
# of filter=c
…
# of filter=c
Bottleneck

H
W
C
…
H
W
C
# of filter=c
3
3 C
…
# of filter=c
…
# of filter=c
H
W
C
1
1 C
…
# of filter=c/2
Bottleneck

H
W
C
…
H
W
C
# of filter=c
3
3 C
…
# of filter=c
…
# of filter=c
H
W
C
1
1 C
…
# of filter=c/2
C/2
H
W
Bottleneck

H
W
C
…
H
W
C
# of filter=c
3
3 C
…
# of filter=c
…
# of filter=c
H
W
C
1
1 C
…
# of filter=c/2
C/2
H
W
…
# of filter=c/2
3
3 c/2
Bottleneck

H
W
C
…
H
W
C
# of filter=c
3
3 C
…
# of filter=c
…
# of filter=c
H
W
C
1
1 C
…
# of filter=c/2
C/2
H
W
…
# of filter=c/2
3
3 c/2
1
1
C/2
…
# of filter=c
Bottleneck

H
W
C
…
H
W
C
# of filter=c
3
3 C
…
# of filter=c
…
# of filter=c
H
W
C
1
1 C
…
# of filter=c/2
C/2
H
W
…
# of filter=c/2
3
3 c/2
1
1
C/2
…
# of filter=c
H
W
C
Bottleneck

Bottleneck

Bottleneck
Test: Measure the duration(time) and accuracy of the model using
bottleneck
Data Set: CIFAR
10
32x32x3 , 10
labels
128
7x7x3 Filters
Output:
32x32x128
Model 1
128
7x7x128 Filters
Relu
1 layer
Model 2
128
3x3x128 Filters
Relu each time
3 layers
Model 3
128
3x3x128 Filters
3 layers
+ bn & restore
Total 5 layers
Relu each time
Input:
32x32x128
2by2 Max pooling
Output:
16x16x128
FNCC
&
Softmax
Epoch: 450
Batch size: 100
due to the environment
condition

Tutorial on convolutional neural networks

step 0, test_accuracy 0.078
training time 5.438 sec
With Bottleneck Three 3 by 3 filters One 7 by 7 filters

CNN Architecture
VGG/Inception/ResNet
Materials from CS231n : Lecture 9. CNN Architecture

CNN Architecture
Materials from CS231n : Lecture 9 CNN Architecture

VGG

Inception

Inception
Auxiliary classification outputs to inject additional gradient
at lower layers (AvgPool-1x1Conv-FC-FC-Softmax)

ResNet

Most of Materials from CS231n
References
http://cs231n.stanford.edu/index.html
Computational Graph
밑바닥부터 시작하는 딥러닝 https://github.com/WegraLee/deep-learning-from-scratch
Inception Model
https://norman3.github.io/papers/docs/google_inception.html
Using tensorflow & tensorboard
https://www.tensorflow.org/get_started/mnist/pros
Tutorials & Programmer’s guide are really useful for me!
https://ratsgo.github.io/deep%20learning/2017/04/05/CNNbackprop/

Tutorial on convolutional neural networks

More Related Content

What's hot (20)

Similar to Tutorial on convolutional neural networks (20)

Recently uploaded (20)

Tutorial on convolutional neural networks