[Revised] Intro to CNN

Google Tech Sprint
Intro to CNN
By Vincent Tatan
Google Data Analyst (Machine Learning)

Meet Vincent
Data Analyst (Machine Learning)
Trust & Safety Google
Medium : towardsdatascience.com/@vincentkernn
Linkedin : linkedin.com/in/vincenttatan/
Data Podcast: https://datacast.simplecast.com/

Disclaimer
This disclaimer informs readers that the views, thoughts, and opinions
expressed in the text belong solely to the author, and not necessarily to the
author’s employer, organization, committee or other group or individual.
This article was made purely as the author’s initiatives and in no way driven by
any other hidden agenda.

Path to Google
Visa
Business Intelligence Intern
May 16 - Jul 16
Lazada Group
Data Scientist Intern
Dec 16 - Apr 17B.Sc., Management Information
Systems and Services
Aug 13 - July 17
Visa
Data & Architecture
Engineer
Jun 17 - Aug 19
Google
Data Analyst, Machine
Learning
Aug 19 - Present

Google:
Trust and Safety To prevent phishing @
scale with Data Analytics
and ML
Google
Data Analyst, Machine Learning
Aug 19 - Present

Understanding CNN
Medium Articles
https://towardsdatascience.com/understanding-cnn-convolutional-neural-network-69fd626ee7d4

Colab Demonstration
Cat v.s. Dog using CNN
https://colab.sandbox.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/images/classification.ipynb

Proprietary + Confidential
Today’s Agenda
1
2
3
4
Machine Learning in Image Recognition
Principles behind CNN: Convolution, ReLU, Max Pooling
CNN stacks + Fully Connected Layer
Avoiding Overfitting + Transfer Learning

1
Machine Learning in Image
Recognition
Classify objects in images1

Image Recognition is hard -- Are all of these cats?
My search for the word “cat”.

Warning -- Not all of these are cats.
We need an automatic way to determine critical features to identify cats or other objects.
Black color
Sharp ears

We need a way to determine the features scalably

2 Principles behind CNN
Convolution, ReLU, Max Pooling

Introducing Convolutional Neural Network (CNN)
Type of neural network comprising of different layers of convolutions
and max pooling mainly for image recognition
1. Convolution
2. ReLU Activation
3. Max Pooling

Intuition of Convolutional Neural Network
What do you see?
Young lady or old grandma or
both?

Psychologist Edwin Boring
introduced the painting of “My Wife
and My Mother-in-Law” where the
figure seems to morph from young
lady to old woman, to the public in
1930. (Source)
These boxes determined the features of the object that you would classify.
● The curve that changed into (ears/eyes)
● The straight line that changed into (choker/mouth)

This means you are relying on
● Prior knowledge (your scrolling habits/understanding of faces)
● Data (strokes of the painting)
To identify Features

CNN distinguishes meaningful features from a sequence of
pixels rather than pixel per pixel to classify the image
1 1 1 1
0 0 1 1
0 1 0 1
1 1 1 1
0 0 0 0
1 1 1 1
1 1 1 1
0 0 0 0

Principle of CNN: Convolution (1-D)
Which order of the element where I see there is a change from 1 to 0?
[1,1,1,1,0,0]
Hint: answer is [0,0,0,1,0]

Which order of the element where I see 1 becomes -1?
1 -11 1 1 1 0 0
1st element : 1*1 + 1*-1 = 0
2nd element : 1*1 + 1*-1 = 0
3rd element : 1*1 + 1*-1 = 0
4th element : 1*1 + 0*-1 = 1
5th element : 0*1 + 0*-1 = 0
End Result: 0 0 0 1 0

Same principle also applies to 2-D where you derive the feature map using the kernels (acts as
weights in NN). With this kernel you can find diagonal patterns
>

Principle of CNN: Convolution (2-D), 3 stacks
Feature Maps

Principle of CNN: Activation Function ReLU
The ReLU function introduces non linearity to activate on a “big enough” stimulus (sparse
activation). This method has been effective to solve vanishing gradients due to its constant 1 for
weight more than 1. Weights that are very small will remain as 0 after the ReLU activation
function.

Principle of CNN: Maxpooling(2-D)
CNN uses max pooling to replace output with a max summary to reduce data size and
processing time. This allows you to determine features that produce the highest impact
and reduces the risk of overfitting.

Recap
Understanding CNN
1. Convolution
2. ReLU Activation Function
3. Maxpooling

3
CNN Stacks + Fully Connected
Layer
Generating Real Results

CNN Stacks + Fully Connected Layer(2-D)
Take the Feature maps from CNN as the inputs and insert it into NN models which will calculate the
probabilities of the image to the labels assigned.
Finally, we will serve the convolutional and max pooling feature map outputs with Fully Connected Layer
(FCL). We flatten the feature outputs to column vector and feed-forward it to FCL. Every node in the previous
layer is connected to the last layer and represents which distinct label to output. FCL represents Artificial
Neural Network and has the final layer as MultiLayer Perceptron (MLP)

CNN Stacks: Cats vs Dogs
The discovery of curves

Fully Connected Layer Activation: Softmax
softmax activation function squashes the probability scores to add up to 1.0.

CNN Big Picture: Cats vs Dogs
The sensors of the neural
network might start by
understanding what edges
and colors it is looking at.
The last layer, called the fully
connected layer,
summarizes this analysis and
gives the probability of
whether it is looking at a dog.
A third layer is trying to
conceptualize what these
shapes represent (for
instance, is it an animal or
something else?).
Another layer is trying to
understand what shapes it is
looking at.

Create and Compile our CNN
Change to
‘categorical_crossentropy’ for
softmax
CNN Stacks
Fully Connected Layer

Results: Cats vs Dogs
Overfitting!
loss: 0.0134
acc: 0.9965
val_loss: 3.5137
val_acc: 0.7290

4
Avoiding Overfitting + Transfer
Learning
Improve Generalization and leverage pretrained
model

Overfitting Mistakes: CNN
Common Mistakes Solutions
Using test set as the validation set to test the
model
Use validation sets as validation tests and test
models as final tests
Dataset is relatively small Image augmentations to add new variants of
images
Over Memorization Dropout/ reduce epochs/layers/neurons per layer
to increase generalizations

Overfitting Solutions: Data Augmentation
In the case of small training data set, you need to artificially boost the diversity and number of training examples.
One way of doing this is to add image augmentations and creating new variants. These include translating
images and creating dimension changes such as zoom, crop, flips, etc.

Overfitting Solutions: Data Augmentation

Overfitting Solutions: Dropout
You could also use regularization techniques such as Dropout to remove activation unit in every gradient step training.
Each epoch training deactivates different neurons.
Since the number of gradient steps is usually high, all neurons will averagely have same occurrences for dropout.
Intuitively, the more you drop out, the less likely your model memorizes.

Overfitting Solutions: Dropout

Much better
loss: 0.5622
acc: 0.7145
val_loss: 0.5090
val_acc: 0.7650

Overfitting CNN
CNN with Dropout and Image
Augmentation
loss: 0.5622
acc: 0.7145
val_loss: 0.5090
val_acc: 0.7650
loss: 0.0134
acc: 0.9965
val_loss: 3.5137
val_acc: 0.7290

Transfer Learning: ImageNet
Transfer learning is a technique that reuses an existing model to the current model. You could produce on top of existing models that were
carefully designed by experts and trained with millions of pictures. You can freeze the model, then only apply the training for the high level
layers.

Important Resources (Teachable Machine)
https://teachablemachine.withgoogle.com/train/image

Today’s Summary
1
2
3
4

Today’s Summary
Transfer Learning
04
● Data Augmentation: artificially boost diversity and number of training examples
● Dropout: Regularization Techniques to remove activation unit in every gradient
step training. Each epoch training deactivates different neurons.
● Transfer learning is a technique that reuses an existing model to the current
model. You could produce on top of existing models that were carefully
designed by experts and trained with millions of pictures.
CNN Stacks +Fully Connected Layer
03
● Fully Connected Layer: Take the Feature maps from CNN as the inputs and
insert it into NN models which will calculate the probabilities of the image to the
labels assigned.
● Softmax activation function which assign decimal probabilities for each
possible label which add up to 1.0.
Principles behind CNN: Convolution, ReLU, and
Max Pooling02
● Convolution: Same principle also applies to 2-D where you derive from the
feature map using the kernels (acts as weights in NN).
● The ReLU function introduces non linearity to activate on a “big enough”
stimulus (sparse activation).
● Max pooling to replace output with a max summary to reduce data size and
processing time.
01
● Classical Image Recognition Allows you to extract features and derive
characteristics.
● Convolution Neural Network (CNN) Deep learning techniques using
convolutions, max pooling to share weights, produce critical features, and
classify labels. Commonly used in image classifications

Reach out to me :)
Medium : towardsdatascience.com/@vincentkernn
Linkedin : linkedin.com/in/vincenttatan/
Survey: tiny.cc/vincent-survey

[Revised] Intro to CNN

More Related Content

What's hot (20)

Similar to [Revised] Intro to CNN (20)

More from Vincent Tatan (7)

Recently uploaded (20)

[Revised] Intro to CNN