SlideShare a Scribd company logo
Batch normalization
Accelerating Deep Network Training by Reducing
Internal Covariate Shift
H O K U N L I N
2 0 2 3 / 8 / 3
1
Outline
Introduction
Related Works
Methodology
Experimental Results
Conclusion
2
Introduction
Sergey Ioffe, Christian Szegedy from Google Reserch, 2015 ICML
It's hard to train deep neural networks
3
SGD is simple but require careful tunning
The inputs to each layer are affected by all preceding layers
Needs to continuously adapt to new distribution
Introduction
4
Introduction
5
Gradient vanishing slow down the convergence.
Changing distribution of input will likely move x into saturated regime
Introduction
Use Batch Normalization fix means and variances of layer inputs
Reduce the dependence of initial values of SGD
Reduce the need for Dropout
6
Match SOTA model on ImageNet using only 7% of training steps
Outline
Introduction
Related Works
Methodology
Experimental Results
Conclusion
7
Related Works
Normalizing the Inputs
Covariate shift
Input distribution to a system changes
8
Internal Covatiate Shift
Input distribution to a network changes due to training
Outline
Introduction
Related Works
Methodology
Experimental Results
Conclusion
9
Methodology
Fix the distribution of the layer inputs
1 0
Normalize each scalar feature independently
Since fill whitening is costly
Normalization via mini-batch
Since we often use mini-batch in SGD
Methodology - BN Transform
1 1
Consider a mini-batch with size m,
Methodology - BN Transform
1 2
Consider a mini-batch with size m,
Methodology - BN Transform
1 3
Consider a mini-batch with size m,
Add learnable parameters gamma and beta
Methodology - Train&Inference
1 4
Methodology - Train&Inference
1 5
Methodology - Train&Inference
1 6
Output depends on data in mini-batch!
Methodology - Train&Inference
1 7
Training Inferencing
Methodology - Train&Inference
1 8
Training Inferencing
Methodology - Train&Inference
1 9
Training Batch Normalization Network Find E[x], Var[x] and inference
Methodology - Train&Inference
2 0
Training Batch Normalization Network Find E[x], Var[x] and inference
Methodology - Train&Inference
2 1
Training Batch Normalization Network Find E[x], Var[x] and inference
Methodology - Train&Inference
2 2
Training Batch Normalization Network Find E[x], Var[x] and inference
Methodology - CNN
Put BN before nonlinearities
Also include learnable gamma and beta
2 3
reset mini-batch size to m times spatial size to obtain convolution properties
Methodology - Observation
With BN, backpropagation is unaffected by the scale of parameters
2 4
Methodology - Observation
With BN, backpropagation is unaffected by the scale of parameters
2 5
Methodology - Observation
With BN, backpropagation is unaffected by the scale of parameters
2 6
Methodology - Observation
With BN, backpropagation is unaffected by the scale of parameters
2 7
BN will stablize the parameter growth
Methodology - observation
With BN, backpropagation is unaffected by the scale of parameters
2 8
BN will stablize the parameter growth
BN enables higher learning rates
Outline
Introduction
Related Works
Methodology
Experimental Results
Conclusion
2 9
MNIST dataset
Handwritten digits dataset
28 x 28 pixel monochrome images
60K training images
3 0
10K testing images
10 labels
Verify internal covariate shift here
MNIST dataset - NN Setup
28 x 28 binary image input
3 FC hidden layers with 100 sigmoid nonlinearities each
1 FC hidden layer with 10 activations with cross-entropy loss
3 1
Train for 50K steps, 60 examples per mini-batch
BN add in each hidden layers
W initialized to small random Gaussian values
MNIST dataset - Result
x represent epoch
y represent test accuracy
3 2
NN with BN has higher test accuracy
MNIST dataset - Result
x represent epoch
y represent output value
3 3
Lines represent {15, 50, 85}th percentiles
Distribution in NN without BN is unstable
Distribution in NN with BN is stable
ImageNet
Train with ILSVRC 2012 dataset
1000 labels
150K test and validation images
3 4
1.2M train images
ImageNet - Inception Model
3 5
GoogLeNet is one of the instance of Inception
Won 2014 ImageNet
SOTA model as baseline
ImageNet - Inception Setup
3 6
ImageNet - Inception Setup
3 7
ImageNet - Inception Setup
3 8
5x5 conv. layers to two 3x3 conv. layers
ImageNet - Inception Setup
3 9
28x28 inception modules from 2 to 3
ImageNet - Inception Setup
4 0
Use average, max-pooling during training
ImageNet - Inception Setup
4 1
Remove board pooling layers between any two incepetion modules
ImageNet - Inception Setup
4 2
Add stride-2 conv./pooling layers before filter in 3c, 4e
ImageNet - BN Setup
4 3
Increase learning rate
Remove Dropout
Reduce the L2 Regularization by a factor of 5
Lower learning rate 6 times faster
Remove Local Response Normalization
Shuffle training examples more thoroughly
Reduce the photometric distortions
ImageNet - BN Setup
4 4
BN-Baseline
Inception + BN before each nonlinearity
ImageNet - BN Setup
4 5
BN-Baseline
Inception + BN before each nonlinearity
BN-x5 / BN-x30
BN-Baseline with learning rate increase by a factor of 5 / 30 (0.0075, 0.045)
ImageNet - BN Setup
4 6
BN-Baseline
Inception + BN before each nonlinearity
BN-x5 / BN-x30
BN-Baseline with learning rate increase by a factor of 5 / 30 (0.0075, 0.045)
BN-x30-Sigmoid
BN-x30 with Sigmoid instead of ReLU
ImageNet - Result
4 7
x represent epoch
y represent validation accuracy
Same acc. in fewer steps with BN
Inception+Sigmoid has acc. < 1/1000
ImageNet - Result
4 8
BN-x30 train slower initially
Higher learning rate, higher acc.
ImageNet - Result
4 9
BN reach 72.2% less than half steps
BN-x5 only needs 14 times fewer steps
Same acc. in fewer steps with BN
Inception+Sigmoid has acc. < 1/1000
Doesn't need Dropout and Local Response Normalization
ImageNet Ensemble - Setup
5 0
6 BN-x30 form BN-Inception ensemble
Increase initial weights in conv. layers
Using Dropout with probability 5% or 10%
Using non-convolutional, per-activation BN in last hidden layer
Predict base on on the arithmetic average
ImageNet Ensemble - Result
5 1
Outline
Introduction
Related Works
Methodology
Experimental Results
Conclusion
5 2
Conclusion
Reduce internal covariate shift speed up training
Add BN to SOTA model yields a substantial speedup in training
Preserve model expressivity
5 3
Allows higher learning rate
Reduce the need for dropout and careful parameter initialization
Beat SOTA model in ImageNet classfication
THANK YOU
5 4

More Related Content

What's hot (20)

PPTX
Convolutional neural network
MojammilHusain
 
PPT
Seda an architecture for well-conditioned scalable internet services
bdemchak
 
PPTX
Extreme learning machine:Theory and applications
James Chou
 
PPT
2.5 backpropagation
Krish_ver2
 
PDF
MULTIMEDIA COMMUNICATION & NETWORKS
Kathirvel Ayyaswamy
 
PPTX
04 Multi-layer Feedforward Networks
Tamer Ahmed Farrag, PhD
 
PPTX
Back propagation network
HIRA Zaidi
 
PPTX
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
taeseon ryu
 
PDF
CS6551 COMPUTER NETWORKS
Kathirvel Ayyaswamy
 
PDF
ResNet basics (Deep Residual Network for Image Recognition)
Sanjay Saha
 
PPTX
Introduction to CNN
Shuai Zhang
 
PPT
Wireless routing protocols
barodia_1437
 
PPT
Reinforcement learning
Chandra Meena
 
PPTX
Processor allocation in Distributed Systems
Ritu Ranjan Shrivastwa
 
PDF
Overview of Convolutional Neural Networks
ananth
 
PPTX
Artifical Neural Network and its applications
Sangeeta Tiwari
 
PPT
Perceptron algorithm
Zul Kawsar
 
PPT
Ftp smtp
Apu Stont
 
PPTX
Window to viewport transformation&amp;matrix representation of homogeneous co...
Mani Kanth
 
Convolutional neural network
MojammilHusain
 
Seda an architecture for well-conditioned scalable internet services
bdemchak
 
Extreme learning machine:Theory and applications
James Chou
 
2.5 backpropagation
Krish_ver2
 
MULTIMEDIA COMMUNICATION & NETWORKS
Kathirvel Ayyaswamy
 
04 Multi-layer Feedforward Networks
Tamer Ahmed Farrag, PhD
 
Back propagation network
HIRA Zaidi
 
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable C...
taeseon ryu
 
CS6551 COMPUTER NETWORKS
Kathirvel Ayyaswamy
 
ResNet basics (Deep Residual Network for Image Recognition)
Sanjay Saha
 
Introduction to CNN
Shuai Zhang
 
Wireless routing protocols
barodia_1437
 
Reinforcement learning
Chandra Meena
 
Processor allocation in Distributed Systems
Ritu Ranjan Shrivastwa
 
Overview of Convolutional Neural Networks
ananth
 
Artifical Neural Network and its applications
Sangeeta Tiwari
 
Perceptron algorithm
Zul Kawsar
 
Ftp smtp
Apu Stont
 
Window to viewport transformation&amp;matrix representation of homogeneous co...
Mani Kanth
 

Similar to Batch normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (20)

PPTX
Batch normalization presentation
Owin Will
 
PDF
Batch normalization
Yuichiro Iio
 
PPTX
machine_learning _presentation_on_paperpptx
jamilhasanrahul
 
PPTX
machine_learning _presentation_on_paperpptx
jamilhasanrahul
 
PDF
Why Batch Normalization Works so Well
Chun-Ming Chang
 
PDF
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Roelof Pieters
 
PDF
Batch normalization paper review
Minho Heo
 
PDF
Bag of tricks for image classification with convolutional neural networks r...
Dongmin Choi
 
PPTX
Batch_Normalization.pptx
gnans Kgnanshek
 
PDF
Deep learning
Rouyun Pan
 
PDF
Understanding Deep Learning
C4Media
 
PPTX
An Introduction to Deep Learning
milad abbasi
 
PPTX
Introduction to Deep Learning
Mehrnaz Faraz
 
PDF
0b85886e-4490-4af0-8b46-7ff3caf5dc2e.pdf
phailinpsp
 
PDF
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Databricks
 
PPTX
Deep Learning for Developers
Julien SIMON
 
PDF
Hardware Acceleration for Machine Learning
CastLabKAIST
 
PPTX
Image classification with Deep Neural Networks
Yogendra Tamang
 
PDF
imageclassification-160206090009.pdf
KammetaJoshna
 
PDF
7 steps for highly effective deep neural networks
Natalino Busa
 
Batch normalization presentation
Owin Will
 
Batch normalization
Yuichiro Iio
 
machine_learning _presentation_on_paperpptx
jamilhasanrahul
 
machine_learning _presentation_on_paperpptx
jamilhasanrahul
 
Why Batch Normalization Works so Well
Chun-Ming Chang
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Roelof Pieters
 
Batch normalization paper review
Minho Heo
 
Bag of tricks for image classification with convolutional neural networks r...
Dongmin Choi
 
Batch_Normalization.pptx
gnans Kgnanshek
 
Deep learning
Rouyun Pan
 
Understanding Deep Learning
C4Media
 
An Introduction to Deep Learning
milad abbasi
 
Introduction to Deep Learning
Mehrnaz Faraz
 
0b85886e-4490-4af0-8b46-7ff3caf5dc2e.pdf
phailinpsp
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Databricks
 
Deep Learning for Developers
Julien SIMON
 
Hardware Acceleration for Machine Learning
CastLabKAIST
 
Image classification with Deep Neural Networks
Yogendra Tamang
 
imageclassification-160206090009.pdf
KammetaJoshna
 
7 steps for highly effective deep neural networks
Natalino Busa
 
Ad

Recently uploaded (20)

PDF
RODENT PEST MANAGEMENT-converted-compressed.pdf
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PPT
Introduction of animal physiology in vertebrates
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PPTX
Pratik inorganic chemistry silicon based ppt
akshaythaker18
 
PDF
Carbon-richDustInjectedintotheInterstellarMediumbyGalacticWCBinaries Survives...
Sérgio Sacani
 
PPTX
Q1 - W1 - D2 - Models of matter for science.pptx
RyanCudal3
 
PDF
FYS 100 final presentation on Afro cubans
RowanSales
 
PPT
Human physiology and digestive system
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PPTX
Entner-Doudoroff pathway by Santosh .pptx
santoshpaudel35
 
PPTX
Different formulation of fungicides.pptx
MrRABIRANJAN
 
PPT
Restriction digestion of DNA for students of undergraduate and post graduate ...
DrMukeshRameshPimpli
 
PDF
Plant growth promoting bacterial non symbiotic
psuvethapalani
 
PDF
Primordial Black Holes and the First Stars
Sérgio Sacani
 
PPTX
GB1 Q1 04 Life in a Cell (1).pptx GRADE 11
JADE ACOSTA
 
PPTX
Diagnostic Features of Common Oral Ulcerative Lesions.pptx
Dr Palak borade
 
PPT
Cell cycle,cell cycle checkpoint and control
DrMukeshRameshPimpli
 
PPTX
ION EXCHANGE CHROMATOGRAPHY NEW PPT (JA).pptx
adhagalejotshna
 
PDF
WUCHERIA BANCROFTI-converted-compressed.pdf
S.B.P.G. COLLEGE BARAGAON VARANASI
 
PDF
Phosphates reveal high pH ocean water on Enceladus
Sérgio Sacani
 
PPTX
Vectors and applications of genetic engineering Pptx
Ashwini I Chuncha
 
PDF
Pharma Part 1.pdf #pharmacology #pharmacology
hikmatyt01
 
RODENT PEST MANAGEMENT-converted-compressed.pdf
S.B.P.G. COLLEGE BARAGAON VARANASI
 
Introduction of animal physiology in vertebrates
S.B.P.G. COLLEGE BARAGAON VARANASI
 
Pratik inorganic chemistry silicon based ppt
akshaythaker18
 
Carbon-richDustInjectedintotheInterstellarMediumbyGalacticWCBinaries Survives...
Sérgio Sacani
 
Q1 - W1 - D2 - Models of matter for science.pptx
RyanCudal3
 
FYS 100 final presentation on Afro cubans
RowanSales
 
Human physiology and digestive system
S.B.P.G. COLLEGE BARAGAON VARANASI
 
Entner-Doudoroff pathway by Santosh .pptx
santoshpaudel35
 
Different formulation of fungicides.pptx
MrRABIRANJAN
 
Restriction digestion of DNA for students of undergraduate and post graduate ...
DrMukeshRameshPimpli
 
Plant growth promoting bacterial non symbiotic
psuvethapalani
 
Primordial Black Holes and the First Stars
Sérgio Sacani
 
GB1 Q1 04 Life in a Cell (1).pptx GRADE 11
JADE ACOSTA
 
Diagnostic Features of Common Oral Ulcerative Lesions.pptx
Dr Palak borade
 
Cell cycle,cell cycle checkpoint and control
DrMukeshRameshPimpli
 
ION EXCHANGE CHROMATOGRAPHY NEW PPT (JA).pptx
adhagalejotshna
 
WUCHERIA BANCROFTI-converted-compressed.pdf
S.B.P.G. COLLEGE BARAGAON VARANASI
 
Phosphates reveal high pH ocean water on Enceladus
Sérgio Sacani
 
Vectors and applications of genetic engineering Pptx
Ashwini I Chuncha
 
Pharma Part 1.pdf #pharmacology #pharmacology
hikmatyt01
 
Ad

Batch normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift