SlideShare a Scribd company logo
DEEP LEARNING
Dr.S.SHAJUN NISHA, MCA.,M.Phil.,M.Tech.,MBA.,Ph.D.,
Assistant Professor & Head ,
PG & Research Dept. of Computer Science
Sadakathullah Appa College(Autonomous),
Tirunelveli.
shajunnisha_s@yahoo.com
+91 99420 96220
Introduction
• Deep learning is a form of machine learning that uses
a model of computing that's very much inspired by the
structure of the brain. Hence we call this model a
neural network. The basic foundational unit of a
neural network is the neuron).
• Each neuron has a set of inputs, each of which is given
a specific weight. The neuron computes some function
on these weighted inputs. A linear neuron takes a
linear combination of the weighted inputs and apply
activation function (sigmoid, tanh etc.)
• Network feeds the weighted sum of the inputs into the
logistic function (in case of sigmoid activation function).
The logistic function returns a value between 0 and 1.
When the weighted sum is very negative, the return
value is very close to 0. When the weighted sum is very
large and positive, the return value is very close to 1
Biological
Neuron
Artificial
Neuron
Number of Neurons in
Species
Introduction
 Softwares used in Deep Learning
 Theano: Python based Deep Learning Library
 TensorFlow: Google’s Deep Learning library runs on top of
Python/C++
 Keras / Lasagne: Light weight wrapper which sits on top
of Theano/TensorFlow, enables faster model prototyping
 Torch: Lua based Deep Learning library with wide support for
machine learning algorithms
 Caffe: Deep Learning library primarily used for processing
pictures
FUNDAMENTALS
 Activation Functions: Every activation function takes a single
number and performs a certain fixed mathematical operation on it.
Below are popularly used activation functions in Deep Learning
 Sigmoid
 Tanh
 Relu
 Sigmoid: Sigmoid Linear
 It has mathematical form σ(x) = 1 / (1+e−x). It takes real valued
number and squashes it into range between 0 and 1. Sigmoid is
popular choice, which makes ease of calculating derivatives and
easy to interpret
 Tanh: Tanh squashes the real valued number to the range [-
1,1]. Output is zero centered. In practice tanh non-linearity is
always preferred to the sigmoid nonlinearity. Also, it can be
proved that Tanh is scaled sigmoid neuron tanh(x) = 2σ(2x) − 1
Sigmoid Activation
Function
.
Tanh Activation
Function
FUNDAMENTALS
.
 ReLU (Rectified Linear Unit): ReLU has
become very popular in last few years. It
computes the function f(x)=max(0,x).
Activation is simply thresholds at zero
 Linear: Linear activation function is used
in Linear regression problems where it
provides derivative always as 1 due to the
function used is f(x) = x
 Relu is now popularly being used in place
of Sigmoid or Tanh due to its better
property of convergence
ReLU Activation Function
Linear Activation Function
Weigh
t9
Activation = g(BiasWeight1 + Weight1 *
Input1 + Weight2 * Input2)
Weight1
Weig
ht2
FUNDAMENTALS
 Forward propagation & Backpropagation:
During the forward propagation stage,
features are input to the network and feed
However, we can calculate error of the
network only at the update the weights to
optimal, we must propagate the network’s
errors backwards through its layers
Hidd
en
1
Hidd
en
2
Hidd
en
3
Input
1
Input
2
Hidd
en
4
Hidd
en
5
Hidd
en
6
Outp
ut
1
Outp
ut
2
Forward Propagation of
Layer 1 Neurons
Hidden Layer 1 Hidden Layer 2
BiasWeight1 Output
Layer
Input
Layer
Activation = g(BiasWeight4 + Weight7 * Hidden1 +
Weight8 * Hidden2+
Weight9 * Hidden3)
Weight
7
Weig
ht8
BiasWeig
ht4
Hidd
en
1
Hidd
en
2
Hidd
en
3
Input
1
Input
2
Hidd
en
4
Hidd
en
5
Hidd
en
6
Outp
ut
1
Outp
ut
2
Forward Propagation of Layer 2 Neurons
Hidden Layer 1 Hidden Layer 2
Output
Layer
Input
Layer
FUNDAMENTALS
Weight
18
Activation = g(BiasWeight7 + Weight16 * Hidden4
+ Weight17 *
Hidden5+ Weight18
* Hidden6)
Hidd
en
1
Hidd
en
2
Hidd
en
3
Input
1
Input
2
Hidd
en
4
Hidd
en
5
Hidd
en
6
Outp
ut
2
Forward Propagation of
Output Layer Neurons
Hidden Layer 1 Hidden Layer 2 Output
Layer
BiasWeight7
Weight16
O
u
t
p
u
t
1
Weight17
Input
Layer
Error (Output 1)= g’(Output
1) * (True1 – Output1)
Hidd
en
1
Hidd
en
2
Hidd
en
3
Input
1
Input
2
Hidd
en
4
Hidd
en
5
Hidd
en
6
Outp
ut
1
Outp
ut
2
Backpropagation of
Output Layer Neurons
Hidden Layer 1 Hidden Layer 2 Output
Layer
Input
Layer
Weight
13
Weight
10
Weigh
t7
Weight
19
FUNDAMENTALS
Error (Hidden4)= g’(Hidden4) + (Weight16 *
Error(Output1) + Weight19 * Error(Output2))
Weight
16
Hidd
en
1
Hidd
en
2
Hidd
en
3
Input
1
Input
2
Hidd
en
4
Hidd
en
5
Hidd
en
6
Outp
ut
1
Outp
ut
2
Backpropagation of
Hidden Layer2 Neurons
Hidden Layer 1 Hidden Layer 2 Output
Layer
Input
Layer
Error (Hidden1)= g’(Hidden1) * (Weight7 *
Error(Hidden4) + Weight10 *
Error(Hidden5) +
Weight13*Error(Hidden6))
Hidd
en
1
Hidd
en
2
Hidd
en
3
Input
1
Input
2
Hidd
en
4
Hidd
en
5
Hidd
en
6
Outp
ut
1
Outp
ut
2
Backpropagation of
Hidden Layer1 Neurons
Hidden Layer 1 Hidden Layer 2 Output
Layer
Input
Layer
Weigh
t8
BiasWeig
ht4
Weigh
t1
Input
Layer
Weigh
t2
FUNDAMENTALS
BiasWeight1= BiasWeight1 +
a * Error(Hidden1) * 1
Weight1 = Weight1 + a *
Error(Hidden1) * Input1
Weight2 = Weight2 + a *
Error(Hidden1) * Input2
Hidd
en
1
Hidd
en
2
Hidd
en
3
Input
1
Input
2
Hidd
en
4
Hidd
en
5
Hidd
en
6
Outp
ut
1
Outp
ut
2
Hidden
Layer 1
BiasWeight1
Hidden
Layer 2
Output
Layer
Weigh
t7
Input
Layer
Weigh
t9
BiasWeight4= BiasWeight4 + a *
Error(Hidden4) * 1 Weight7 =
Weight7 + a * Error(Hidden4) *
Hidden1 Weight8 = Weight8 + a
* Error(Hidden4) * Hidden2
Weight9 = Weight9 + a *
Error(Hidden4) * Hidden3
Hidd
en
1
Hidd
en
2
Hidd
en
3
Input
1
Input
2
Hidd
en
4
Hidd
en
5
Hidd
en
6
Outp
ut
1
Outp
ut
2
Output
Layer
Updating Weights of Hidden
Layer 1 – Hidden Layer 2 during
Forward Propagation
Hidden Layer 1 Hidden Layer 2
Updating Weights of Hidden Layer
1 – Hidden Layer 2 during
backward Propagation
FUNDAMENTALS
 Dropout: Dropout is a regularization in Neural
networks to avoid over fitting the data. Typically
Dropout is 0.8 (80 % neurons present randomly all
the time) in initial layers and 0.5 in middle layers
 Optimization: Various techniques used to optimize
the weights including
 SGD (Stochastic Gradient Descent)
 Momentum
 Nag (Nesterov Accelerated Gradient)
 Adagrad (Adaptive gradient)
 Adadelta
 Rmsprop
 Adam (Adaptive moment estimation)
In practice Adam is good default choice, if you
cant afford full batch updates, then try out L-
BFGS
Application of Dropout in
Neural network
Optimization of Error
Surface
FUNDAMENTALS
 Stochastic Gradient Descent (SGD): Gradient
descent is a way to minimize an objective function
J(θ) parameterized by a model’s parameter θ∈Rd by
updating the parameters in the opposite direction of
the gradient of the objective function w.r.to the
parameters. Learning rate determines the size of
steps taken to reach minimum.
 Batch Gradient Descent (all training observations
per each iteration)
 SGD (1 observation per iteration)
 Mini Batch Gradient Descent (size of about
50 training observations for each iteration)
 Momentum: SGD has trouble navigating surface curves
much more steeply in one dimension than in other, in
these scenarios SGD
Gradient
Descent
Comparison of SGD without & with
Momentum
FUNDAMENTALS
• Nesterov Accelerated Gradient (NAG): If a ball rolls down a hill and blindly
follows a slope, is highly unsatisfactory and it should have a notion of where it
is going so that it knows to slow down before the hill slopes up again. NAG is a
way to give momentum term this kind of prescience
• Adagrad: Adagrad is an algorithm for gradient-based optimization that adapts
the differential learning rate to parameters, performing larger updates for
infrequent and smaller updates.
Nesterov Momentum
Update
FUNDAMENTALS
 Adadelta: Adadelta is an extension of Adagrad that seeks to reduce its
aggressive, monotonically decreasing learning rate. Instead of accumulating all
past squared gradients, Adadelta restricts the window of accumulated past
gradients to some fixed size w
RMSprop: RMSprop and Adadelta have both developed independently around the
same time to resolve Adagrad’s radically diminishing learning rates
 Adam (Adaptive Moment Estimation): Adam is another method that computes
adaptive learning rates for each parameter. In addition to storing an exponentially
decaying average of past squared gradients like Adadelta and RMSprop, Adam
also keeps an exponentially decaying average of past gradients similar to
momentum
Deep Architecture of ANN
(Artificial Neural Network)
 All hidden layers should have same
number of neurons per layer
 Typically 2 hidden layers are good
enough to solve majority of problems
Using scaling/batch normalization
(mean 0, variance
 for all input variables after each layer
improves convergence effectiveness
 Reduction in step size after each
iteration improves convergence, in
addition to usage of momentum &
Dropout
Decision Boundary of Deep
Architecture
CONVOLUTIONAL NEURAL NETWORKS
 Convolutional Neural Networks used in
picture analysis, including image
captioning, digit recognizer and various
visual system processing E.g.: Vision
detection in Self driving cars, Hand
written Digit recognizer, Google
Deepmind’ s Alphago
Object recognition and classification
using Convolutional Networks
CNN application in Self
Driving Cars
CNN application in Handwritten
Digit Recognizer
CONVOLUTIONAL NEURAL NETWORKS
Hubel & Weisel experiments
on Cat’s Vision
Formation of features over layers
using Neural Networks
• Hubel & Wiesel inserted microscopic electrodes
into the visual cortex of anesthetized cat to read
activity of the single cells in visual cortex while
presenting various stimuli to it’s eyes during
experiment on 1959. For which received noble
prize under Medicine category on 1981Hubel &
Wiesel discovered that vision is hierarchical,
consists of simple cells, complex cells & hyper-
complex cells
Object Detection
Vision is Hierarchical phenomenon
CONVOLUTIONAL NEURAL NETWORKS
• Input layer/picture consists of 32 x 32
pixels with 3 colors (Red, Green & Blue)
(32 x 32 x 3)
• Convolution layer is formed by running a
filter (5 x 5 x 3) over Input layer which
will result in (28 x 28 x 1)
Input Layer & Filter
Running filter over Input Layer to form
Convolution layer Complete Convolution Layer
from filter
CONVOLUTIONAL NEURAL NETWORKS
 2nd Convolution layer has been created in
similar way with another filter
 After striding/convolving with 6 filters,
new layer has been created with 28 x 28 x
6 dimension
Complete Convolution layer
from Filter 2
Convolution layers created
with 6 Filters Formation of complete 2nd layer
CONVOLUTIONAL NEURAL NETWORKS
Max pooling working methodology
Max pool layer after performing pooling
• Pooling Layer: Pooling layer makes the
representation smaller and more
manageable. Operates over each
activation map independently. Pooling
applies on width and breadth of the layer
and depth will remains the same during
pooling stage
• Padding: Size of the image (width &
breadth) is getting shrunk consecutively,
this issue is undesirable during deep
networks, padding keeps the size of picture
constant or controllable in size throughout
the network
Zero padding on 6 x 6 picture
RECURRENT NEURAL NETWORKS
 Recurrent neural networks are very much
useful in sequence remembering, time
series forecasting, Image captioning,
machine translation etc.
 RNNs are useful in building A.I. Chabot in
which sequence of words with all syntaxes &
semantics would be remembered and
subsequently provide answers to given
questions
Recurrent Neural Networks
Image Captioning using Convolutional and
Recurrent Neural Network
Application of RNN in A.I. Chatbot
RECURRENT NEURAL NETWORKS
• Recurrent neural network is used for
processing sequence of vectors x by
applying a recurrence formula at every time
step
Recurrent Neural Network
Vanilla
Network
Image
Captioning
(image -> Seq. of
words)
Sentiment
Classification
(Seq. of words -> Sentiment)
Machine
Translation
(Seq. of words -> Seq. of
words)
Video
Classification on
frame level
yt
x
t
RNN
y0
x0
RNN
y1
x1
RNN
y2
x2
RNN
yt
x
t
RNN
RECURRENT NEURAL NETWORKS
 Vanishing gradient problem with RNN:
Gradients do vanishes quickly with more number
of layers and this issue is severe with RNN.
Vanishing gradients leads to slow training rates.
LSTM & GRU are used to avoid this issue
 LSTM (Long Short Term Memory): LSTM is an
artificial neural network contains LSTM blocks in
addition to regular network units. LSTM block
contains gates that determine when the input is
significant enough to remember, when it should
continue to remember or when it should forget
the value and when it should output the value
LSTM Working Principle
(Backpropagation )
LSTM Cell
Deep Autoencoders
 Deep Autoencoder: Autoencoder neural network is an unsupervised learning
algorithm that applies backpropagation. Stacking layers of Autoencoders produces
a deeper architecture known as Stacked or Deep Autoencoders
 Application of Encoders in Face recognition, Speech recognition, Signal
Denoising etc.
PCA vs Deep Autoencoder for
MNIST Data
Face Recognition using Deep
Autoencoders
Deep Autoencoders
• Deep Autoencoder: Autoencoder neural
network is an unsupervised learning
algorithm that applies backpropagation,
setting the target values to be equal to
the inputs. i.e. it uses y ( i ) = x ( i )
• Typically deep Autoencoder is composed of
two segments, encoding network and
decoding network.
Deep Autoencoder
Examples
Training Deep
Autoencoder
Autoencoder with
Classifier
Reconstruction of features with
weight transpose
Auto encoders in Deep Learning

More Related Content

PPTX
Multilayer & Back propagation algorithm
swapnac12
 
DOCX
Backpropagation
ariffast
 
PDF
Classification by back propagation, multi layered feed forward neural network...
bihira aggrey
 
PDF
The Back Propagation Learning Algorithm
ESCOM
 
PPT
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
PPTX
Back propagation network
HIRA Zaidi
 
PPTX
Multi Layer Network
International Islamic University
 
PPT
2.5 backpropagation
Krish_ver2
 
Multilayer & Back propagation algorithm
swapnac12
 
Backpropagation
ariffast
 
Classification by back propagation, multi layered feed forward neural network...
bihira aggrey
 
The Back Propagation Learning Algorithm
ESCOM
 
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
Back propagation network
HIRA Zaidi
 
2.5 backpropagation
Krish_ver2
 

What's hot (20)

PPT
Classification using back propagation algorithm
KIRAN R
 
PPTX
04 Multi-layer Feedforward Networks
Tamer Ahmed Farrag, PhD
 
PDF
Classification By Back Propagation
BineeshJose99
 
PPTX
Back propagation method
Prof. Neeta Awasthy
 
PPT
NIPS2007: deep belief nets
zukun
 
PPTX
DNN and RBM
Masayuki Tanaka
 
PDF
On Implementation of Neuron Network(Back-propagation)
Yu Liu
 
PPTX
Activation functions and Training Algorithms for Deep Neural network
Gayatri Khanvilkar
 
PPTX
Feedforward neural network
Sopheaktra YONG
 
PDF
Multi Layer Perceptron & Back Propagation
Sung-ju Kim
 
PPTX
Neural net and back propagation
Mohit Shrivastava
 
PPT
backpropagation in neural networks
Akash Goel
 
PPT
Principles of soft computing-Associative memory networks
Sivagowry Shathesh
 
PPT
Lec 6-bp
Taymoor Nazmy
 
PPT
Adaline madaline
Nagarajan
 
PDF
soft computing
AMIT KUMAR
 
PPT
MPerceptron
butest
 
PDF
Restricted Boltzmann Machine - A comprehensive study with a focus on Deep Bel...
Indraneel Pole
 
PPT
Multi-Layer Perceptrons
ESCOM
 
PPTX
Basic Learning Algorithms of ANN
waseem khan
 
Classification using back propagation algorithm
KIRAN R
 
04 Multi-layer Feedforward Networks
Tamer Ahmed Farrag, PhD
 
Classification By Back Propagation
BineeshJose99
 
Back propagation method
Prof. Neeta Awasthy
 
NIPS2007: deep belief nets
zukun
 
DNN and RBM
Masayuki Tanaka
 
On Implementation of Neuron Network(Back-propagation)
Yu Liu
 
Activation functions and Training Algorithms for Deep Neural network
Gayatri Khanvilkar
 
Feedforward neural network
Sopheaktra YONG
 
Multi Layer Perceptron & Back Propagation
Sung-ju Kim
 
Neural net and back propagation
Mohit Shrivastava
 
backpropagation in neural networks
Akash Goel
 
Principles of soft computing-Associative memory networks
Sivagowry Shathesh
 
Lec 6-bp
Taymoor Nazmy
 
Adaline madaline
Nagarajan
 
soft computing
AMIT KUMAR
 
MPerceptron
butest
 
Restricted Boltzmann Machine - A comprehensive study with a focus on Deep Bel...
Indraneel Pole
 
Multi-Layer Perceptrons
ESCOM
 
Basic Learning Algorithms of ANN
waseem khan
 
Ad

Similar to Auto encoders in Deep Learning (20)

PPTX
Deep learning
Pratap Dangeti
 
PPTX
UNIT IV NEURAL NETWORKS - Multilayer perceptron
jkowsysara
 
PPTX
Deep Learning
MoctardOLOULADE
 
PPTX
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
SrideviPcSenthilkuma
 
PPTX
Introduction to deep Learning Fundamentals
VishalGour25
 
PDF
Foundations: Artificial Neural Networks
ananth
 
PPTX
Tsinghua invited talk_zhou_xing_v2r0
Joe Xing
 
PDF
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Databricks
 
PPTX
Deep Learning in Recommender Systems - RecSys Summer School 2017
Balázs Hidasi
 
PPTX
Neural Networks and its related Concepts
SAMPADABHONDE1
 
PDF
Deep learning
Kuppusamy P
 
PDF
Artificial neural network paper
AkashRanjandas1
 
PPTX
Introduction to Deep learning and H2O for beginner's
Vidyasagar Bhargava
 
PPTX
Neural network basic and introduction of Deep learning
Tapas Majumdar
 
PDF
Deep learning - a primer
Shirin Elsinghorst
 
PDF
Deep learning - a primer
Uwe Friedrichsen
 
PDF
#7 Neural Networks Artificial intelligence
MustansarAli20
 
PDF
Neural Networks Basic Concepts and Deep Learning
rahuljain582793
 
PPTX
Visualization of Deep Learning
YaminiAlapati1
 
PPT
UNIT 5-ANN.ppt
Sivam Chinna
 
Deep learning
Pratap Dangeti
 
UNIT IV NEURAL NETWORKS - Multilayer perceptron
jkowsysara
 
Deep Learning
MoctardOLOULADE
 
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
SrideviPcSenthilkuma
 
Introduction to deep Learning Fundamentals
VishalGour25
 
Foundations: Artificial Neural Networks
ananth
 
Tsinghua invited talk_zhou_xing_v2r0
Joe Xing
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
Databricks
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Balázs Hidasi
 
Neural Networks and its related Concepts
SAMPADABHONDE1
 
Deep learning
Kuppusamy P
 
Artificial neural network paper
AkashRanjandas1
 
Introduction to Deep learning and H2O for beginner's
Vidyasagar Bhargava
 
Neural network basic and introduction of Deep learning
Tapas Majumdar
 
Deep learning - a primer
Shirin Elsinghorst
 
Deep learning - a primer
Uwe Friedrichsen
 
#7 Neural Networks Artificial intelligence
MustansarAli20
 
Neural Networks Basic Concepts and Deep Learning
rahuljain582793
 
Visualization of Deep Learning
YaminiAlapati1
 
UNIT 5-ANN.ppt
Sivam Chinna
 
Ad

More from Shajun Nisha (18)

PPTX
Google meet and its extensions sac
Shajun Nisha
 
PPT
Dip syntax 4
Shajun Nisha
 
PPTX
Dip fundamentals 2
Shajun Nisha
 
PPTX
Dip application 1
Shajun Nisha
 
PPTX
Dip digital image 3
Shajun Nisha
 
PPTX
ICT tools
Shajun Nisha
 
PPTX
25 environmental ethics intellectual property rights
Shajun Nisha
 
PPTX
Linear regression in machine learning
Shajun Nisha
 
PPTX
Basics of research in research methodology
Shajun Nisha
 
PPTX
Teaching Aptitude in Research Methodology
Shajun Nisha
 
PPTX
Perceptron and Sigmoid Neurons
Shajun Nisha
 
PPTX
Mc Culloch Pitts Neuron
Shajun Nisha
 
PPTX
Intensity Transformation and Spatial filtering
Shajun Nisha
 
PPTX
Image Restoration (Digital Image Processing)
Shajun Nisha
 
PPTX
ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)
Shajun Nisha
 
PPTX
Image processing lab work
Shajun Nisha
 
PPTX
introduction to cloud computing
Shajun Nisha
 
PPTX
online learning NPTEL
Shajun Nisha
 
Google meet and its extensions sac
Shajun Nisha
 
Dip syntax 4
Shajun Nisha
 
Dip fundamentals 2
Shajun Nisha
 
Dip application 1
Shajun Nisha
 
Dip digital image 3
Shajun Nisha
 
ICT tools
Shajun Nisha
 
25 environmental ethics intellectual property rights
Shajun Nisha
 
Linear regression in machine learning
Shajun Nisha
 
Basics of research in research methodology
Shajun Nisha
 
Teaching Aptitude in Research Methodology
Shajun Nisha
 
Perceptron and Sigmoid Neurons
Shajun Nisha
 
Mc Culloch Pitts Neuron
Shajun Nisha
 
Intensity Transformation and Spatial filtering
Shajun Nisha
 
Image Restoration (Digital Image Processing)
Shajun Nisha
 
ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)
Shajun Nisha
 
Image processing lab work
Shajun Nisha
 
introduction to cloud computing
Shajun Nisha
 
online learning NPTEL
Shajun Nisha
 

Recently uploaded (20)

PPTX
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
PDF
UTS Health Student Promotional Representative_Position Description.pdf
Faculty of Health, University of Technology Sydney
 
DOCX
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
PPTX
Autodock-for-Beginners by Rahul D Jawarkar.pptx
Rahul Jawarkar
 
PDF
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
PPTX
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
PPTX
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
PDF
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
PDF
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
PPTX
FSSAI (Food Safety and Standards Authority of India) & FDA (Food and Drug Adm...
Dr. Paindla Jyothirmai
 
PPTX
TEF & EA Bsc Nursing 5th sem.....BBBpptx
AneetaSharma15
 
DOCX
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
PPTX
Odoo 18 Sales_ Managing Quotation Validity
Celine George
 
PDF
Sunset Boulevard Student Revision Booklet
jpinnuck
 
PPTX
Trends in pediatric nursing .pptx
AneetaSharma15
 
PPTX
Kanban Cards _ Mass Action in Odoo 18.2 - Odoo Slides
Celine George
 
PDF
PG-BPSDMP 2 TAHUN 2025PG-BPSDMP 2 TAHUN 2025.pdf
AshifaRamadhani
 
PPTX
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
PPTX
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
PPTX
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
HISTORY COLLECTION FOR PSYCHIATRIC PATIENTS.pptx
PoojaSen20
 
UTS Health Student Promotional Representative_Position Description.pdf
Faculty of Health, University of Technology Sydney
 
Action Plan_ARAL PROGRAM_ STAND ALONE SHS.docx
Levenmartlacuna1
 
Autodock-for-Beginners by Rahul D Jawarkar.pptx
Rahul Jawarkar
 
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
Python-Application-in-Drug-Design by R D Jawarkar.pptx
Rahul Jawarkar
 
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
FSSAI (Food Safety and Standards Authority of India) & FDA (Food and Drug Adm...
Dr. Paindla Jyothirmai
 
TEF & EA Bsc Nursing 5th sem.....BBBpptx
AneetaSharma15
 
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
Odoo 18 Sales_ Managing Quotation Validity
Celine George
 
Sunset Boulevard Student Revision Booklet
jpinnuck
 
Trends in pediatric nursing .pptx
AneetaSharma15
 
Kanban Cards _ Mass Action in Odoo 18.2 - Odoo Slides
Celine George
 
PG-BPSDMP 2 TAHUN 2025PG-BPSDMP 2 TAHUN 2025.pdf
AshifaRamadhani
 
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 

Auto encoders in Deep Learning

  • 1. DEEP LEARNING Dr.S.SHAJUN NISHA, MCA.,M.Phil.,M.Tech.,MBA.,Ph.D., Assistant Professor & Head , PG & Research Dept. of Computer Science Sadakathullah Appa College(Autonomous), Tirunelveli. [email protected] +91 99420 96220
  • 2. Introduction • Deep learning is a form of machine learning that uses a model of computing that's very much inspired by the structure of the brain. Hence we call this model a neural network. The basic foundational unit of a neural network is the neuron). • Each neuron has a set of inputs, each of which is given a specific weight. The neuron computes some function on these weighted inputs. A linear neuron takes a linear combination of the weighted inputs and apply activation function (sigmoid, tanh etc.) • Network feeds the weighted sum of the inputs into the logistic function (in case of sigmoid activation function). The logistic function returns a value between 0 and 1. When the weighted sum is very negative, the return value is very close to 0. When the weighted sum is very large and positive, the return value is very close to 1 Biological Neuron Artificial Neuron Number of Neurons in Species
  • 3. Introduction  Softwares used in Deep Learning  Theano: Python based Deep Learning Library  TensorFlow: Google’s Deep Learning library runs on top of Python/C++  Keras / Lasagne: Light weight wrapper which sits on top of Theano/TensorFlow, enables faster model prototyping  Torch: Lua based Deep Learning library with wide support for machine learning algorithms  Caffe: Deep Learning library primarily used for processing pictures
  • 4. FUNDAMENTALS  Activation Functions: Every activation function takes a single number and performs a certain fixed mathematical operation on it. Below are popularly used activation functions in Deep Learning  Sigmoid  Tanh  Relu  Sigmoid: Sigmoid Linear  It has mathematical form σ(x) = 1 / (1+e−x). It takes real valued number and squashes it into range between 0 and 1. Sigmoid is popular choice, which makes ease of calculating derivatives and easy to interpret  Tanh: Tanh squashes the real valued number to the range [- 1,1]. Output is zero centered. In practice tanh non-linearity is always preferred to the sigmoid nonlinearity. Also, it can be proved that Tanh is scaled sigmoid neuron tanh(x) = 2σ(2x) − 1 Sigmoid Activation Function . Tanh Activation Function
  • 5. FUNDAMENTALS .  ReLU (Rectified Linear Unit): ReLU has become very popular in last few years. It computes the function f(x)=max(0,x). Activation is simply thresholds at zero  Linear: Linear activation function is used in Linear regression problems where it provides derivative always as 1 due to the function used is f(x) = x  Relu is now popularly being used in place of Sigmoid or Tanh due to its better property of convergence ReLU Activation Function Linear Activation Function
  • 6. Weigh t9 Activation = g(BiasWeight1 + Weight1 * Input1 + Weight2 * Input2) Weight1 Weig ht2 FUNDAMENTALS  Forward propagation & Backpropagation: During the forward propagation stage, features are input to the network and feed However, we can calculate error of the network only at the update the weights to optimal, we must propagate the network’s errors backwards through its layers Hidd en 1 Hidd en 2 Hidd en 3 Input 1 Input 2 Hidd en 4 Hidd en 5 Hidd en 6 Outp ut 1 Outp ut 2 Forward Propagation of Layer 1 Neurons Hidden Layer 1 Hidden Layer 2 BiasWeight1 Output Layer Input Layer Activation = g(BiasWeight4 + Weight7 * Hidden1 + Weight8 * Hidden2+ Weight9 * Hidden3) Weight 7 Weig ht8 BiasWeig ht4 Hidd en 1 Hidd en 2 Hidd en 3 Input 1 Input 2 Hidd en 4 Hidd en 5 Hidd en 6 Outp ut 1 Outp ut 2 Forward Propagation of Layer 2 Neurons Hidden Layer 1 Hidden Layer 2 Output Layer Input Layer
  • 7. FUNDAMENTALS Weight 18 Activation = g(BiasWeight7 + Weight16 * Hidden4 + Weight17 * Hidden5+ Weight18 * Hidden6) Hidd en 1 Hidd en 2 Hidd en 3 Input 1 Input 2 Hidd en 4 Hidd en 5 Hidd en 6 Outp ut 2 Forward Propagation of Output Layer Neurons Hidden Layer 1 Hidden Layer 2 Output Layer BiasWeight7 Weight16 O u t p u t 1 Weight17 Input Layer Error (Output 1)= g’(Output 1) * (True1 – Output1) Hidd en 1 Hidd en 2 Hidd en 3 Input 1 Input 2 Hidd en 4 Hidd en 5 Hidd en 6 Outp ut 1 Outp ut 2 Backpropagation of Output Layer Neurons Hidden Layer 1 Hidden Layer 2 Output Layer Input Layer
  • 8. Weight 13 Weight 10 Weigh t7 Weight 19 FUNDAMENTALS Error (Hidden4)= g’(Hidden4) + (Weight16 * Error(Output1) + Weight19 * Error(Output2)) Weight 16 Hidd en 1 Hidd en 2 Hidd en 3 Input 1 Input 2 Hidd en 4 Hidd en 5 Hidd en 6 Outp ut 1 Outp ut 2 Backpropagation of Hidden Layer2 Neurons Hidden Layer 1 Hidden Layer 2 Output Layer Input Layer Error (Hidden1)= g’(Hidden1) * (Weight7 * Error(Hidden4) + Weight10 * Error(Hidden5) + Weight13*Error(Hidden6)) Hidd en 1 Hidd en 2 Hidd en 3 Input 1 Input 2 Hidd en 4 Hidd en 5 Hidd en 6 Outp ut 1 Outp ut 2 Backpropagation of Hidden Layer1 Neurons Hidden Layer 1 Hidden Layer 2 Output Layer Input Layer
  • 9. Weigh t8 BiasWeig ht4 Weigh t1 Input Layer Weigh t2 FUNDAMENTALS BiasWeight1= BiasWeight1 + a * Error(Hidden1) * 1 Weight1 = Weight1 + a * Error(Hidden1) * Input1 Weight2 = Weight2 + a * Error(Hidden1) * Input2 Hidd en 1 Hidd en 2 Hidd en 3 Input 1 Input 2 Hidd en 4 Hidd en 5 Hidd en 6 Outp ut 1 Outp ut 2 Hidden Layer 1 BiasWeight1 Hidden Layer 2 Output Layer Weigh t7 Input Layer Weigh t9 BiasWeight4= BiasWeight4 + a * Error(Hidden4) * 1 Weight7 = Weight7 + a * Error(Hidden4) * Hidden1 Weight8 = Weight8 + a * Error(Hidden4) * Hidden2 Weight9 = Weight9 + a * Error(Hidden4) * Hidden3 Hidd en 1 Hidd en 2 Hidd en 3 Input 1 Input 2 Hidd en 4 Hidd en 5 Hidd en 6 Outp ut 1 Outp ut 2 Output Layer Updating Weights of Hidden Layer 1 – Hidden Layer 2 during Forward Propagation Hidden Layer 1 Hidden Layer 2 Updating Weights of Hidden Layer 1 – Hidden Layer 2 during backward Propagation
  • 10. FUNDAMENTALS  Dropout: Dropout is a regularization in Neural networks to avoid over fitting the data. Typically Dropout is 0.8 (80 % neurons present randomly all the time) in initial layers and 0.5 in middle layers  Optimization: Various techniques used to optimize the weights including  SGD (Stochastic Gradient Descent)  Momentum  Nag (Nesterov Accelerated Gradient)  Adagrad (Adaptive gradient)  Adadelta  Rmsprop  Adam (Adaptive moment estimation) In practice Adam is good default choice, if you cant afford full batch updates, then try out L- BFGS Application of Dropout in Neural network Optimization of Error Surface
  • 11. FUNDAMENTALS  Stochastic Gradient Descent (SGD): Gradient descent is a way to minimize an objective function J(θ) parameterized by a model’s parameter θ∈Rd by updating the parameters in the opposite direction of the gradient of the objective function w.r.to the parameters. Learning rate determines the size of steps taken to reach minimum.  Batch Gradient Descent (all training observations per each iteration)  SGD (1 observation per iteration)  Mini Batch Gradient Descent (size of about 50 training observations for each iteration)  Momentum: SGD has trouble navigating surface curves much more steeply in one dimension than in other, in these scenarios SGD Gradient Descent Comparison of SGD without & with Momentum
  • 12. FUNDAMENTALS • Nesterov Accelerated Gradient (NAG): If a ball rolls down a hill and blindly follows a slope, is highly unsatisfactory and it should have a notion of where it is going so that it knows to slow down before the hill slopes up again. NAG is a way to give momentum term this kind of prescience • Adagrad: Adagrad is an algorithm for gradient-based optimization that adapts the differential learning rate to parameters, performing larger updates for infrequent and smaller updates. Nesterov Momentum Update
  • 13. FUNDAMENTALS  Adadelta: Adadelta is an extension of Adagrad that seeks to reduce its aggressive, monotonically decreasing learning rate. Instead of accumulating all past squared gradients, Adadelta restricts the window of accumulated past gradients to some fixed size w RMSprop: RMSprop and Adadelta have both developed independently around the same time to resolve Adagrad’s radically diminishing learning rates  Adam (Adaptive Moment Estimation): Adam is another method that computes adaptive learning rates for each parameter. In addition to storing an exponentially decaying average of past squared gradients like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients similar to momentum
  • 14. Deep Architecture of ANN (Artificial Neural Network)  All hidden layers should have same number of neurons per layer  Typically 2 hidden layers are good enough to solve majority of problems Using scaling/batch normalization (mean 0, variance  for all input variables after each layer improves convergence effectiveness  Reduction in step size after each iteration improves convergence, in addition to usage of momentum & Dropout Decision Boundary of Deep Architecture
  • 15. CONVOLUTIONAL NEURAL NETWORKS  Convolutional Neural Networks used in picture analysis, including image captioning, digit recognizer and various visual system processing E.g.: Vision detection in Self driving cars, Hand written Digit recognizer, Google Deepmind’ s Alphago Object recognition and classification using Convolutional Networks CNN application in Self Driving Cars CNN application in Handwritten Digit Recognizer
  • 16. CONVOLUTIONAL NEURAL NETWORKS Hubel & Weisel experiments on Cat’s Vision Formation of features over layers using Neural Networks • Hubel & Wiesel inserted microscopic electrodes into the visual cortex of anesthetized cat to read activity of the single cells in visual cortex while presenting various stimuli to it’s eyes during experiment on 1959. For which received noble prize under Medicine category on 1981Hubel & Wiesel discovered that vision is hierarchical, consists of simple cells, complex cells & hyper- complex cells Object Detection Vision is Hierarchical phenomenon
  • 17. CONVOLUTIONAL NEURAL NETWORKS • Input layer/picture consists of 32 x 32 pixels with 3 colors (Red, Green & Blue) (32 x 32 x 3) • Convolution layer is formed by running a filter (5 x 5 x 3) over Input layer which will result in (28 x 28 x 1) Input Layer & Filter Running filter over Input Layer to form Convolution layer Complete Convolution Layer from filter
  • 18. CONVOLUTIONAL NEURAL NETWORKS  2nd Convolution layer has been created in similar way with another filter  After striding/convolving with 6 filters, new layer has been created with 28 x 28 x 6 dimension Complete Convolution layer from Filter 2 Convolution layers created with 6 Filters Formation of complete 2nd layer
  • 19. CONVOLUTIONAL NEURAL NETWORKS Max pooling working methodology Max pool layer after performing pooling • Pooling Layer: Pooling layer makes the representation smaller and more manageable. Operates over each activation map independently. Pooling applies on width and breadth of the layer and depth will remains the same during pooling stage • Padding: Size of the image (width & breadth) is getting shrunk consecutively, this issue is undesirable during deep networks, padding keeps the size of picture constant or controllable in size throughout the network Zero padding on 6 x 6 picture
  • 20. RECURRENT NEURAL NETWORKS  Recurrent neural networks are very much useful in sequence remembering, time series forecasting, Image captioning, machine translation etc.  RNNs are useful in building A.I. Chabot in which sequence of words with all syntaxes & semantics would be remembered and subsequently provide answers to given questions Recurrent Neural Networks Image Captioning using Convolutional and Recurrent Neural Network Application of RNN in A.I. Chatbot
  • 21. RECURRENT NEURAL NETWORKS • Recurrent neural network is used for processing sequence of vectors x by applying a recurrence formula at every time step Recurrent Neural Network Vanilla Network Image Captioning (image -> Seq. of words) Sentiment Classification (Seq. of words -> Sentiment) Machine Translation (Seq. of words -> Seq. of words) Video Classification on frame level yt x t RNN y0 x0 RNN y1 x1 RNN y2 x2 RNN yt x t RNN
  • 22. RECURRENT NEURAL NETWORKS  Vanishing gradient problem with RNN: Gradients do vanishes quickly with more number of layers and this issue is severe with RNN. Vanishing gradients leads to slow training rates. LSTM & GRU are used to avoid this issue  LSTM (Long Short Term Memory): LSTM is an artificial neural network contains LSTM blocks in addition to regular network units. LSTM block contains gates that determine when the input is significant enough to remember, when it should continue to remember or when it should forget the value and when it should output the value LSTM Working Principle (Backpropagation ) LSTM Cell
  • 23. Deep Autoencoders  Deep Autoencoder: Autoencoder neural network is an unsupervised learning algorithm that applies backpropagation. Stacking layers of Autoencoders produces a deeper architecture known as Stacked or Deep Autoencoders  Application of Encoders in Face recognition, Speech recognition, Signal Denoising etc. PCA vs Deep Autoencoder for MNIST Data Face Recognition using Deep Autoencoders
  • 24. Deep Autoencoders • Deep Autoencoder: Autoencoder neural network is an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs. i.e. it uses y ( i ) = x ( i ) • Typically deep Autoencoder is composed of two segments, encoding network and decoding network. Deep Autoencoder Examples Training Deep Autoencoder Autoencoder with Classifier Reconstruction of features with weight transpose