SlideShare a Scribd company logo
Convolution Neural Network
Perception in Artificial Intelligence
Recall Neural Networks
 Neural Networks receive an input (a single
vector)
 Transform it through a series of hidden layers
 Each hidden layer is made up of a set of
neurons, where each neuron is fully
connected to all neurons in the previous layer
 Neurons in a single layer operate completely
independently and do not share any
connections
 Last fully-connected layer is called the “output
layer” and in classification settings it
represents the class scores
Recall Neural Networks
Suppose you have to design a Neural
Networks for classifying images into
categories: cat, dog, monkey, human.
Also the network consists of a single input
layer, hidden layer and output layer.
Moreover, the size of an image is 50*50,
number of units at hidden layer are 100.
What will be the size of weight matrix for
hidden layer and output layer?
Recall Neural Networks
In CIFAR-10, images are only of size
32x32x3 (32 wide, 32 high, 3 color
channels)
How much weights do we need for a single
fully-connected neuron in a first hidden
layer?
Recall Neural Networks
In CIFAR-10, images are only of size
32x32x3 (32 wide, 32 high, 3 color channels)
How much weights do we need for a single
fully-connected neuron in a first hidden layer:
32*32*3 = 3072 weights.
Smaller Network: Convolutional
Neural Network
 Do we really need all the edges?
 Can some of these be shared?
Smaller Network: Convolutional
Neural Network
Idea: Patterns are often much smaller
than the whole image, there is no point
of input whole image into the network
“beak” detector
Can represent a small region with fewer parameters
Same pattern appears in different places:
They can be compressed!
What about training a lot of such “small” detectors
and each detector must “move around”.
“upper-left
beak” detector
“middle beak”
detector
They can be compressed
to the same parameters.
Neural Networks -> Convolution Neural
Networks (CNN)
Instead of connecting a neuron to all of the
neurons of the previous layer, connect
each neuron to a some neurons in the
previous layer.
Neural Networks -> Convolution Neural
Networks (CNN)
Instead of connecting a neuron to all of the
neurons of the previous layer, connect
each neuron to a some neurons in the
previous layer.
Neural Networks -> Convolution Neural
Networks (CNN)
Instead of connecting a neuron to all of the
neurons of the previous layer, connect
each neuron to a some neurons in the
previous layer.
Convolution
Input
layer
Neural Networks -> Convolution Neural
Networks (CNN)
Instead of connecting a neuron to all of the
neurons of the previous layer, connect
each neuron to a some neurons in the
previous layer.
Convolution
Input
layer
Full connected
Neural Networks -> Convolution Neural
Networks (CNN)
Instead of connecting a neuron to all of the
neurons of the previous layer, connect
each neuron to a some neurons in the
previous layer.
Convolution
Input
layer
Full connected
Single
Dimensional
Convolution
1D Convolution
w1 w2
x1 x2 x3
X =
W =
Z = z1
1D Convolution
w1 w2
x1 x2 x3
X =
W =
Z = z1 z2
1D Convolution
w1 w2
x1 x2 x3
X =
W =
Z = z1 z2
L
Correlation/
similarity
1D Convolution
w1 w2
x1 x2 x3
X =
W =
Z = z1 z2
L
Correlation/
similarity
Ẃ = w2 w1
Flip
Convolution
A convolutional layer
A filter
A CNN is a neural network with some convolutional layers
(and some other layers). A convolutional layer has a number
of filters that does convolutional operation.
Beak detector
Convolution
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
1 -1 -1
-1 1 -1
-1 -1 1
Filter 1
-1 1 -1
-1 1 -1
-1 1 -1
Filter 2
…
…
These are the network
parameters to be learned.
Each filter detects a
small pattern (3 x 3).
Convolution
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
1 -1 -1
-1 1 -1
-1 -1 1
Filter 1
3 -1
stride=1
Dot
product
Convolution
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
1 -1 -1
-1 1 -1
-1 -1 1
Filter 1
3 -3
If stride=2
Convolution
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
1 -1 -1
-1 1 -1
-1 -1 1
Filter 1
3 -1 -3 -1
-3 1 0 -3
-3 -3 0 1
3 -2 -2 -1
stride=1
Convolution
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
3 -1 -3 -1
-3 1 0 -3
-3 -3 0 1
3 -2 -2 -1
-1 1 -1
-1 1 -1
-1 1 -1
Filter 2
-1 -1 -1 -1
-1 -1 -2 1
-1 -1 -2 1
-1 0 -4 3
Repeat this for each filter
stride=1
Two 4 x 4 images
Forming 2 x 4 x 4 matrix
Feature
Map
Color image: RGB 3 channels
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
1 -1 -1
-1 1 -1
-1 -1 1
Filter 1
-1 1 -1
-1 1 -1
-1 1 -1
Filter 2
1 -1 -1
-1 1 -1
-1 -1 1
1 -1 -1
-1 1 -1
-1 -1 1
-1 1 -1
-1 1 -1
-1 1 -1
-1 1 -1
-1 1 -1
-1 1 -1
Color image
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
image
convolution
-1 1 -1
-1 1 -1
-1 1 -1
1 -1 -1
-1 1 -1
-1 -1 1
1
x
2
x
…
…
36
x
…
…
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
Convolution v.s. Fully Connected
Fully-
connected
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
1 -1 -1
-1 1 -1
-1 -1 1
Filter 1
1
2
3
…
8
9
…
13
14
15
… Only connect to
9 inputs, not
fully connected
4:
10:
16
1
0
0
0
0
1
0
0
0
0
1
1
3
fewer parameters!
7
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
1 -1 -1
-1 1 -1
-1 -1 1
Filter 1
1:
2:
3:
…
7:
8:
9:
…
1
3:
14:
15:
…
4:
10:
16:
1
0
0
0
0
1
0
0
0
0
1
1
3
-1
Shared weights
6 x 6 image
Fewer parameters
Even fewer parameters
A Closer Look at Spatial Dimensions
7*7 input (spatially)
Assume 3*3 size filter
A Closer Look at Spatial Dimensions
7*7 input (spatially)
Assume 3*3 size filter
A Closer Look at Spatial Dimensions
7*7 input (spatially)
Assume 3*3 size filter
A Closer Look at Spatial Dimensions
7*7 input (spatially)
Assume 3*3 size filter
A Closer Look at Spatial Dimensions
7*7 input (spatially)
Assume 3*3 size
filter→ 5*5 output
A Closer Look at Spatial Dimensions
7*7 input (spatially)
Assume 3*3 size filter
with stride of 2
A Closer Look at Spatial Dimensions
7*7 input (spatially)
Assume 3*3 size filter
with stride of 2
A Closer Look at Spatial Dimensions
7*7 input (spatially)
Assume 3*3 size
filter→ 3*3 output
A Closer Look at Spatial Dimensions
7*7 input (spatially)
Assume 3*3 size filter
with stride of 3
What will be output
size?
A Closer Look at Spatial Dimensions
7*7 input (spatially)
Assume 3*3 size filter
with stride of 3
What will be output
size?
Doesn’t fit, you
cannot do it.
A Closer Look at Spatial Dimensions
7*7 input (spatially)
Assume 3*3 size filter
with stride of 3
Output size:
(N-F)/stride+1
e.g. N = 7, F = 3
Stride 1 → (7-3)/1+1 = 5
Stride 3 → (7-3)/3 +1 =
2.33
N
N
F
F
Zero Padding
0
0
0
0 0 0
0
7*7 input (spatially)
Assume 3*3 size filter
with stride of 1 → 7*7
output
Zero padding with (F-1)/2
will preserve size spatiall
e.g.
F=3 → zero pad with 1
F=5 → zero pad with 2
F=7 → zero pad with 3
The whole CNN
Fully Connected
Feedforward network
cat dog ……
Convolution
Max Pooling
Convolution
Max Pooling
Flattened
Can
repeat
many
times
Max Pooling
3 -1 -3 -1
-3 1 0 -3
-3 -3 0 1
3 -2 -2 -1
-1 1 -1
-1 1 -1
-1 1 -1
Filter 2
-1 -1 -1 -1
-1 -1 -2 1
-1 -1 -2 1
-1 0 -4 3
1 -1 -1
-1 1 -1
-1 -1 1
Filter 1
Why Pooling
 Subsampling pixels will not change the object
Subsampling
bird
bird
We can subsample the pixels to make image
smaller fewer parameters to characterize the image
A CNN compresses a fully connected
network in two ways:
Reducing number of connections
Shared weights on the edges
Max pooling further reduces the complexity
Max Pooling
1 0 0 0 0 1
0 1 0 0 1 0
0 0 1 1 0 0
1 0 0 0 1 0
0 1 0 0 1 0
0 0 1 0 1 0
6 x 6 image
3 0
1
3
-1 1
3
0
2 x 2 image
Each filter
is a channel
New image
but smaller
Conv
Max
Pooling
The whole CNN
Convolution
Max Pooling
Convolution
Max Pooling
Can
repeat
many
times
A new image
The number of channels
is the number of filters
Smaller than the original
image
3 0
1
3
-1 1
3
0
The whole CNN
Fully Connected
Feedforward network
cat dog ……
Convolution
Max Pooling
Convolution
Max Pooling
Flattened
A new image
A new image
Flattening
3 0
1
3
-1 1
3
0 Flattened
3
0
1
3
-1
1
0
3
Fully Connected
Feedforward network
From Matrices to Tensors: Working With
3D Images
From Matrices to Tensors: Working With
3D Images
From Matrices to Tensors: Working With
3D Images
From Matrices to Tensors: Working With
3D Images
From Matrices to Tensors: Working With
3D Images
From Matrices to Tensors: Working With
3D Images
Adv.TopicsAICNN.ppt
Convolutional Layer Settings
Pooling
Pooling Layer Settings
Question
Input volume: 32*32*3
10 5*5 filter with stride 1 pad 2
Output size ?
Number of parameters?
LeNet-5
AlexNet-2012
AlexNet-2012
AlexNet-2012
AlexNet-2012
AlexNet-2012
AlexNet-2012
ZFNet-2013
VGGNet-2014
ResNet-2015
ResNet-2015
ResNet-2015
ResNet-2015
Only modified the network structure and
input format (vector -> 3-D tensor)
CNN in Keras
Convolution
Max Pooling
Convolution
Max Pooling
input
1 -1 -1
-1 1 -1
-1 -1 1
-1 1 -1
-1 1 -1
-1 1 -1
There are
25 3x3
filters.
…
…
Input_shape = ( 28 , 28 , 1)
1: black/white, 3: RGB
28 x 28 pixels
3 -1
-3 1
3
Only modified the network structure and
input format (vector -> 3-D array)
CNN in Keras
Convolution
Max Pooling
Convolution
Max Pooling
Input
1 x 28 x 28
25 x 26 x 26
25 x 13 x 13
50 x 11 x 11
50 x 5 x 5
How many parameters for
each filter?
How many parameters
for each filter?
9
225=
25x9
Only modified the network structure and
input format (vector -> 3-D array)
CNN in Keras
Convolution
Max Pooling
Convolution
Max Pooling
Input
1 x 28 x 28
25 x 26 x 26
25 x 13 x 13
50 x 11 x 11
50 x 5 x 5
Flattened
1250
Fully connected
feedforward network
Output
AlphaGo
Neural
Network
(19 x 19
positions)
Next move
19 x 19 matrix
Black: 1
white: -1
none: 0
Fully-connected feedforward
network can be used
But CNN performs much better
AlphaGo’s policy network
Note: AlphaGo does not use Max Pooling.
The following is quotation from their Nature article:
CNN in speech recognition
Time
Frequency
Spectrogram
CNN
Image
The filters move in the
frequency direction.
CNN in text classification
Source of image:
http://citeseerx.ist.psu.edu/viewdoc/downlo
ad?doi=10.1.1.703.6858&rep=rep1&type=p
df
?
http://www.joshuakim.io/understanding-
how-convolutional-neural-network-cnn-
perform-text-classification-with-word-
embeddings/
https://www.analyticsvidhya.com/blog/202
0/02/mathematics-behind-convolutional-
neural-network/

More Related Content

PPT
Introduction to Deep-Learning-CNN Arch.ppt
khandarevaibhav
 
PPTX
Deep-Learning-2017-Lecture5CNN.pptx
Dr. Radhey Shyam
 
PPT
Deep-Learning-2017-Lecture5CNN.ppt
EngineeringTamilan
 
PPT
Deep-Learning
Amnaalia
 
PPT
Deep-Learning-2017-Lecture5CNN.ppt
archn4
 
PPT
Deep-Learning-2017-Lecture5CNN.ppt
AshishKumarSingh176
 
PPT
Deep-Learning-2017-Lecture5CNN.ppt
AminHa5
 
PPT
Deep-Learning presnetation by someone from the internet
clash12hero
 
Introduction to Deep-Learning-CNN Arch.ppt
khandarevaibhav
 
Deep-Learning-2017-Lecture5CNN.pptx
Dr. Radhey Shyam
 
Deep-Learning-2017-Lecture5CNN.ppt
EngineeringTamilan
 
Deep-Learning
Amnaalia
 
Deep-Learning-2017-Lecture5CNN.ppt
archn4
 
Deep-Learning-2017-Lecture5CNN.ppt
AshishKumarSingh176
 
Deep-Learning-2017-Lecture5CNN.ppt
AminHa5
 
Deep-Learning presnetation by someone from the internet
clash12hero
 

Similar to Adv.TopicsAICNN.ppt (20)

PPT
Deep-Learning-2017-Lecture5CNN.ppt
kundurti
 
PPT
Deep learning-2017-lecture5 cnn
AnandShinde47
 
PPT
Deep-Learning-2017-Lecture5CNN.ppt
SaadMemon23
 
PPT
Deep Learning Techniques like CNN and RNN
SumaiyaSk
 
PPT
Deep-Learning-2017-Lecture5CNN.ppt
sghorai
 
PPT
Deep learning-smaller neural network
sonykhan3
 
PPT
Deep-Learning-Convolutional Neural Networks and Sequence Modeling.ppt
PraveenVundrajavarap
 
PPT
Deep-Learning-2017-Lecture5CNN.ppt
rohithprabhas1
 
PPT
Deep-Learning-2017-Lecture5CNN.ppt
sruthiksanalkumar
 
PDF
convolutional neural network and its applications.pdf
SubhamKumar3239
 
PPT
Deep Learning approach in Machine learning
vipulkondekar
 
PPT
Convolutional Neural Networks definicion y otros
ssuserf85a91
 
PPTX
Deep-LearningwithVisualExamplesExplaine.pptx
ansarinazish958
 
PPT
digital image processing - convolutional networks
Muhammad824617
 
PDF
AI_Theory: Covolutional_neuron_network.pdf
21146290
 
PDF
Practical Deep Learning Using Tensor Flow - Sandeep Kath
Sandeep Kath
 
PPTX
Machine learning algorithms like CNN and LSTM
monihareni
 
PPTX
Deep learning in E-Commerce Applications and Challenges (CNN)
Houda Bakir
 
PDF
convolutional neural networks for machine learning
omogire08
 
PDF
cnn.pdf
AKANKSHADIXIT52
 
Deep-Learning-2017-Lecture5CNN.ppt
kundurti
 
Deep learning-2017-lecture5 cnn
AnandShinde47
 
Deep-Learning-2017-Lecture5CNN.ppt
SaadMemon23
 
Deep Learning Techniques like CNN and RNN
SumaiyaSk
 
Deep-Learning-2017-Lecture5CNN.ppt
sghorai
 
Deep learning-smaller neural network
sonykhan3
 
Deep-Learning-Convolutional Neural Networks and Sequence Modeling.ppt
PraveenVundrajavarap
 
Deep-Learning-2017-Lecture5CNN.ppt
rohithprabhas1
 
Deep-Learning-2017-Lecture5CNN.ppt
sruthiksanalkumar
 
convolutional neural network and its applications.pdf
SubhamKumar3239
 
Deep Learning approach in Machine learning
vipulkondekar
 
Convolutional Neural Networks definicion y otros
ssuserf85a91
 
Deep-LearningwithVisualExamplesExplaine.pptx
ansarinazish958
 
digital image processing - convolutional networks
Muhammad824617
 
AI_Theory: Covolutional_neuron_network.pdf
21146290
 
Practical Deep Learning Using Tensor Flow - Sandeep Kath
Sandeep Kath
 
Machine learning algorithms like CNN and LSTM
monihareni
 
Deep learning in E-Commerce Applications and Challenges (CNN)
Houda Bakir
 
convolutional neural networks for machine learning
omogire08
 

Recently uploaded (20)

PPTX
Explanation about Structures in C language.pptx
Veeral Rathod
 
PDF
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
PPTX
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
PPTX
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
PDF
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PDF
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
PPT
Activate_Methodology_Summary presentatio
annapureddyn
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
DOCX
Can You Build Dashboards Using Open Source Visualization Tool.docx
Varsha Nayak
 
PPTX
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
PDF
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PPTX
Presentation about variables and constant.pptx
kr2589474
 
PPTX
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
PDF
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
PPTX
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPT
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
PPTX
PFAS Reporting Requirements 2026 Are You Submission Ready Certivo.pptx
Certivo Inc
 
PPTX
oapresentation.pptx
mehatdhavalrajubhai
 
Explanation about Structures in C language.pptx
Veeral Rathod
 
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
Maximizing Revenue with Marketo Measure: A Deep Dive into Multi-Touch Attribu...
bbedford2
 
slidesgo-unlocking-the-code-the-dynamic-dance-of-variables-and-constants-2024...
kr2589474
 
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
Activate_Methodology_Summary presentatio
annapureddyn
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
Can You Build Dashboards Using Open Source Visualization Tool.docx
Varsha Nayak
 
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
Presentation about variables and constant.pptx
kr2589474
 
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
PFAS Reporting Requirements 2026 Are You Submission Ready Certivo.pptx
Certivo Inc
 
oapresentation.pptx
mehatdhavalrajubhai
 

Adv.TopicsAICNN.ppt

  • 1. Convolution Neural Network Perception in Artificial Intelligence
  • 2. Recall Neural Networks  Neural Networks receive an input (a single vector)  Transform it through a series of hidden layers  Each hidden layer is made up of a set of neurons, where each neuron is fully connected to all neurons in the previous layer  Neurons in a single layer operate completely independently and do not share any connections  Last fully-connected layer is called the “output layer” and in classification settings it represents the class scores
  • 3. Recall Neural Networks Suppose you have to design a Neural Networks for classifying images into categories: cat, dog, monkey, human. Also the network consists of a single input layer, hidden layer and output layer. Moreover, the size of an image is 50*50, number of units at hidden layer are 100. What will be the size of weight matrix for hidden layer and output layer?
  • 4. Recall Neural Networks In CIFAR-10, images are only of size 32x32x3 (32 wide, 32 high, 3 color channels) How much weights do we need for a single fully-connected neuron in a first hidden layer?
  • 5. Recall Neural Networks In CIFAR-10, images are only of size 32x32x3 (32 wide, 32 high, 3 color channels) How much weights do we need for a single fully-connected neuron in a first hidden layer: 32*32*3 = 3072 weights.
  • 6. Smaller Network: Convolutional Neural Network  Do we really need all the edges?  Can some of these be shared?
  • 7. Smaller Network: Convolutional Neural Network Idea: Patterns are often much smaller than the whole image, there is no point of input whole image into the network “beak” detector Can represent a small region with fewer parameters
  • 8. Same pattern appears in different places: They can be compressed! What about training a lot of such “small” detectors and each detector must “move around”. “upper-left beak” detector “middle beak” detector They can be compressed to the same parameters.
  • 9. Neural Networks -> Convolution Neural Networks (CNN) Instead of connecting a neuron to all of the neurons of the previous layer, connect each neuron to a some neurons in the previous layer.
  • 10. Neural Networks -> Convolution Neural Networks (CNN) Instead of connecting a neuron to all of the neurons of the previous layer, connect each neuron to a some neurons in the previous layer.
  • 11. Neural Networks -> Convolution Neural Networks (CNN) Instead of connecting a neuron to all of the neurons of the previous layer, connect each neuron to a some neurons in the previous layer. Convolution Input layer
  • 12. Neural Networks -> Convolution Neural Networks (CNN) Instead of connecting a neuron to all of the neurons of the previous layer, connect each neuron to a some neurons in the previous layer. Convolution Input layer Full connected
  • 13. Neural Networks -> Convolution Neural Networks (CNN) Instead of connecting a neuron to all of the neurons of the previous layer, connect each neuron to a some neurons in the previous layer. Convolution Input layer Full connected Single Dimensional Convolution
  • 14. 1D Convolution w1 w2 x1 x2 x3 X = W = Z = z1
  • 15. 1D Convolution w1 w2 x1 x2 x3 X = W = Z = z1 z2
  • 16. 1D Convolution w1 w2 x1 x2 x3 X = W = Z = z1 z2 L Correlation/ similarity
  • 17. 1D Convolution w1 w2 x1 x2 x3 X = W = Z = z1 z2 L Correlation/ similarity Ẃ = w2 w1 Flip Convolution
  • 18. A convolutional layer A filter A CNN is a neural network with some convolutional layers (and some other layers). A convolutional layer has a number of filters that does convolutional operation. Beak detector
  • 19. Convolution 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 6 x 6 image 1 -1 -1 -1 1 -1 -1 -1 1 Filter 1 -1 1 -1 -1 1 -1 -1 1 -1 Filter 2 … … These are the network parameters to be learned. Each filter detects a small pattern (3 x 3).
  • 20. Convolution 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 6 x 6 image 1 -1 -1 -1 1 -1 -1 -1 1 Filter 1 3 -1 stride=1 Dot product
  • 21. Convolution 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 6 x 6 image 1 -1 -1 -1 1 -1 -1 -1 1 Filter 1 3 -3 If stride=2
  • 22. Convolution 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 6 x 6 image 1 -1 -1 -1 1 -1 -1 -1 1 Filter 1 3 -1 -3 -1 -3 1 0 -3 -3 -3 0 1 3 -2 -2 -1 stride=1
  • 23. Convolution 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 6 x 6 image 3 -1 -3 -1 -3 1 0 -3 -3 -3 0 1 3 -2 -2 -1 -1 1 -1 -1 1 -1 -1 1 -1 Filter 2 -1 -1 -1 -1 -1 -1 -2 1 -1 -1 -2 1 -1 0 -4 3 Repeat this for each filter stride=1 Two 4 x 4 images Forming 2 x 4 x 4 matrix Feature Map
  • 24. Color image: RGB 3 channels 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 1 -1 -1 -1 1 -1 -1 -1 1 Filter 1 -1 1 -1 -1 1 -1 -1 1 -1 Filter 2 1 -1 -1 -1 1 -1 -1 -1 1 1 -1 -1 -1 1 -1 -1 -1 1 -1 1 -1 -1 1 -1 -1 1 -1 -1 1 -1 -1 1 -1 -1 1 -1 Color image
  • 25. 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 image convolution -1 1 -1 -1 1 -1 -1 1 -1 1 -1 -1 -1 1 -1 -1 -1 1 1 x 2 x … … 36 x … … 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 Convolution v.s. Fully Connected Fully- connected
  • 26. 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 6 x 6 image 1 -1 -1 -1 1 -1 -1 -1 1 Filter 1 1 2 3 … 8 9 … 13 14 15 … Only connect to 9 inputs, not fully connected 4: 10: 16 1 0 0 0 0 1 0 0 0 0 1 1 3 fewer parameters! 7
  • 27. 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 1 -1 -1 -1 1 -1 -1 -1 1 Filter 1 1: 2: 3: … 7: 8: 9: … 1 3: 14: 15: … 4: 10: 16: 1 0 0 0 0 1 0 0 0 0 1 1 3 -1 Shared weights 6 x 6 image Fewer parameters Even fewer parameters
  • 28. A Closer Look at Spatial Dimensions 7*7 input (spatially) Assume 3*3 size filter
  • 29. A Closer Look at Spatial Dimensions 7*7 input (spatially) Assume 3*3 size filter
  • 30. A Closer Look at Spatial Dimensions 7*7 input (spatially) Assume 3*3 size filter
  • 31. A Closer Look at Spatial Dimensions 7*7 input (spatially) Assume 3*3 size filter
  • 32. A Closer Look at Spatial Dimensions 7*7 input (spatially) Assume 3*3 size filter→ 5*5 output
  • 33. A Closer Look at Spatial Dimensions 7*7 input (spatially) Assume 3*3 size filter with stride of 2
  • 34. A Closer Look at Spatial Dimensions 7*7 input (spatially) Assume 3*3 size filter with stride of 2
  • 35. A Closer Look at Spatial Dimensions 7*7 input (spatially) Assume 3*3 size filter→ 3*3 output
  • 36. A Closer Look at Spatial Dimensions 7*7 input (spatially) Assume 3*3 size filter with stride of 3 What will be output size?
  • 37. A Closer Look at Spatial Dimensions 7*7 input (spatially) Assume 3*3 size filter with stride of 3 What will be output size? Doesn’t fit, you cannot do it.
  • 38. A Closer Look at Spatial Dimensions 7*7 input (spatially) Assume 3*3 size filter with stride of 3 Output size: (N-F)/stride+1 e.g. N = 7, F = 3 Stride 1 → (7-3)/1+1 = 5 Stride 3 → (7-3)/3 +1 = 2.33 N N F F
  • 39. Zero Padding 0 0 0 0 0 0 0 7*7 input (spatially) Assume 3*3 size filter with stride of 1 → 7*7 output Zero padding with (F-1)/2 will preserve size spatiall e.g. F=3 → zero pad with 1 F=5 → zero pad with 2 F=7 → zero pad with 3
  • 40. The whole CNN Fully Connected Feedforward network cat dog …… Convolution Max Pooling Convolution Max Pooling Flattened Can repeat many times
  • 41. Max Pooling 3 -1 -3 -1 -3 1 0 -3 -3 -3 0 1 3 -2 -2 -1 -1 1 -1 -1 1 -1 -1 1 -1 Filter 2 -1 -1 -1 -1 -1 -1 -2 1 -1 -1 -2 1 -1 0 -4 3 1 -1 -1 -1 1 -1 -1 -1 1 Filter 1
  • 42. Why Pooling  Subsampling pixels will not change the object Subsampling bird bird We can subsample the pixels to make image smaller fewer parameters to characterize the image
  • 43. A CNN compresses a fully connected network in two ways: Reducing number of connections Shared weights on the edges Max pooling further reduces the complexity
  • 44. Max Pooling 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 6 x 6 image 3 0 1 3 -1 1 3 0 2 x 2 image Each filter is a channel New image but smaller Conv Max Pooling
  • 45. The whole CNN Convolution Max Pooling Convolution Max Pooling Can repeat many times A new image The number of channels is the number of filters Smaller than the original image 3 0 1 3 -1 1 3 0
  • 46. The whole CNN Fully Connected Feedforward network cat dog …… Convolution Max Pooling Convolution Max Pooling Flattened A new image A new image
  • 47. Flattening 3 0 1 3 -1 1 3 0 Flattened 3 0 1 3 -1 1 0 3 Fully Connected Feedforward network
  • 48. From Matrices to Tensors: Working With 3D Images
  • 49. From Matrices to Tensors: Working With 3D Images
  • 50. From Matrices to Tensors: Working With 3D Images
  • 51. From Matrices to Tensors: Working With 3D Images
  • 52. From Matrices to Tensors: Working With 3D Images
  • 53. From Matrices to Tensors: Working With 3D Images
  • 58. Question Input volume: 32*32*3 10 5*5 filter with stride 1 pad 2 Output size ? Number of parameters?
  • 72. Only modified the network structure and input format (vector -> 3-D tensor) CNN in Keras Convolution Max Pooling Convolution Max Pooling input 1 -1 -1 -1 1 -1 -1 -1 1 -1 1 -1 -1 1 -1 -1 1 -1 There are 25 3x3 filters. … … Input_shape = ( 28 , 28 , 1) 1: black/white, 3: RGB 28 x 28 pixels 3 -1 -3 1 3
  • 73. Only modified the network structure and input format (vector -> 3-D array) CNN in Keras Convolution Max Pooling Convolution Max Pooling Input 1 x 28 x 28 25 x 26 x 26 25 x 13 x 13 50 x 11 x 11 50 x 5 x 5 How many parameters for each filter? How many parameters for each filter? 9 225= 25x9
  • 74. Only modified the network structure and input format (vector -> 3-D array) CNN in Keras Convolution Max Pooling Convolution Max Pooling Input 1 x 28 x 28 25 x 26 x 26 25 x 13 x 13 50 x 11 x 11 50 x 5 x 5 Flattened 1250 Fully connected feedforward network Output
  • 75. AlphaGo Neural Network (19 x 19 positions) Next move 19 x 19 matrix Black: 1 white: -1 none: 0 Fully-connected feedforward network can be used But CNN performs much better
  • 76. AlphaGo’s policy network Note: AlphaGo does not use Max Pooling. The following is quotation from their Nature article:
  • 77. CNN in speech recognition Time Frequency Spectrogram CNN Image The filters move in the frequency direction.
  • 78. CNN in text classification Source of image: http://citeseerx.ist.psu.edu/viewdoc/downlo ad?doi=10.1.1.703.6858&rep=rep1&type=p df ?

Editor's Notes

  • #15: You have filters and you are searching in an image the shapes of the filters. Learning is about finding filters that are good for detecting objects. Generally, finding properties of data occurring more often than noise.
  • #16: You have filters and you are searching in an image the shapes of the filters. Learning is about finding filters that are good for detecting objects. Generally, finding properties of data occurring more often than noise.
  • #17: You have filters and you are searching in an image the shapes of the filters. Learning is about finding filters that are good for detecting objects. Generally, finding properties of data occurring more often than noise.
  • #18: You have filters and you are searching in an image the shapes of the filters. Learning is about finding filters that are good for detecting objects. Generally, finding properties of data occurring more often than noise. W bar is W with flipped parameters. Stride is number of steps a filter moves over the image, typically is one. Zero padding is also another common practice to acquire feature map equivalent to image.