SlideShare a Scribd company logo
Three classes of deep learning networks
 Deep learning refers to a rather wide class of
machine learning techniques and architectures,
with the hallmark of using many layers of non-
linear information processing that are
hierarchical in nature.
 Deep learning or hierarchical learning, has
emerged as a new area of machine learning
research.
 The techniques developed from deep learning
research have already been impacting a wide
range of signal and information processing work.
 Deep learning has various closely related
definitions or high-level descriptions:
 DEFINITION-1:
A class of machine learning techniques that
exploit many layers of non-linear information
processing for supervised or unsupervised
feature extraction and transformation, and for
pattern analysis and classification.
 NOTE:
The deep learning that we discuss in this
monograph is about learning with deep
architectures for signal and information
processing. It is not about deep understanding
of the signal or information.
 Depending on how the architectures and
techniques are intended for use, e.g.,
synthesis/generation or recognition/
classification, one can broadly categorize
most of the work in this area into three major
classes:
 1) Deep networks for unsupervised or
generative learning.
2) Deep networks for supervised learning.
3) Hybrid Deep networks.
 Unsupervised learning refers to no use of
task specific supervision information (e.g.,
target class labels) in the learning process.
 Deep networks in this category can be either
generative or non generative in nature.
 Most of them are generative in nature(with
examples of RBMs, DBNs, DBMs, and
generalized de-noising auto-encoders ).
 Among the various subclasses of generative
or unsupervised deep networks, the energy-
based deep models are the most common.
 Deep belief network (DBN):
Probabilistic generative models composed of multiple
layers of stochastic, hidden variables. The top two
layers have undirected, symmetric connections between
them. The lower layers receive top-down, directed
connections from the layer above.
Boltzmann machine (BM):
A network of symmetrically connected, neuron-like
units that make stochastic decisions about whether to
be on or off.
Restricted Boltzmann machine (RBM):
A special type of BM consisting of a layer of visible
units and a layer of hidden units with no visible-visible
or hidden-hidden connections.
Deep Belief Network
Representation
Boltzmann
machine
Restricted
Boltzmann Machine
 Deep neural network (DNN):
A multilayer perceptron with many hidden layers,
whose weights are fully connected and are often
(although not always) initialized using either an
unsupervised or a supervised pre-training
technique. (In the literature prior to 2012, a DBN
was often used incorrectly to mean a DNN.)
 Deep auto-encoder:
A “discriminative” DNN whose output
targets are the data input itself rather than class
labels; hence an unsupervised learning model.
When trained with a de-noising criterion, a deep
auto-encoder is also a generative model and can
be sampled from.
DEEP AUTOENCODER
 Another prominent type of deep
unsupervised models with generative
capability is the deep Boltzmann machine or
DBM.A DBM contains many layers of hidden
variables, and has no connections between
the variables within the same layer.
 When the number of hidden layers of DBM is
reduced to one, we have restricted Boltzmann
machine(RBM).
 Another representative deep generative
network that can be used for unsupervised
(as well as supervised) learning is the sum–
product network or SPN.
 An SPN is a directed acyclic graph with the
observed variables as leaves, and with sum
and product operations as internal nodes in
the deep network.
Three classes of deep learning networks
 Recurrent neural networks (RNNs) can be
considered as another class of deep networks
for unsupervised(as well as supervised)
learning, where the depth can be as large as
the length of the input data sequence.
 RNN is used to predict the data sequence in
the future using the previous data samples.
 Deep generative graphical models are a
powerful tool in many applications due to
their capability of embedding domain
knowledge.
 However, they are often used with
inappropriate approximations in inference,
learning, prediction, and topology design, all
arising from inherent intractability in these
tasks for most real-world applications.
 Finally, dynamic or temporally recursive
generative models based on neural network
architectures can be found in for human
motion modeling, and in for natural language
and natural scene parsing
 Many of the discriminative techniques for
supervised learning in signal and information
processing are shallow architectures such as
HMM’s and conditional random fields (CRFs).
 A CRF is intrinsically a shallow discriminative
architecture, characterized by the linear
relationship between the input features and the
transition features.
 Deep-structured CRFs have been developed by
stacking the output in each lower layer of the
CRF, together with the original input data, onto
its higher layer.
 Other major existing discriminative models
in speech recognition based mainly on the
traditional neural network or MLP architecture
using back-propagation learning with
random initialization.
 It argues for the importance of both the
increased width of each layer of the neural
networks and the increased depth.
 A new deep learning architecture, sometimes
called deep stacking network (DSN), together
with its tensor variant and its kernel version,
are developed that all focus on discrimination
with scalable, parallelizable, block-wise
learning relying on little or no generative
component.
DEEP STACKING NETWORK
 Another type of discriminative deep
architecture is the convolutional neural
network (CNN), in which each module
consists of a convolutional layer and a
pooling layer. These modules are often
stacked up with one on top of another, or
with a DNN on top of it, to form a deep model
 The weight sharing in the convolutional
layer, together with appropriately chosen
pooling schemes, endows the CNN with some
“invariance” properties .
CONVOLUTIONAL NEURAL NETWORK
 Nevertheless, CNNs have been found highly
effective and been commonly used in
computer vision and image recognition. More
recently, with appropriate changes from the
CNN designed for image analysis to that
taking into account speech-specific
properties, the CNN is also found effective for
speech recognition
 The Time-delay neural network (TDNN)
developed for early speech recognition is a
special case and predecessor of the CNN
when weight sharing is limited to one of the
two dimensions, i.e., time dimension, and
there is no pooling layer
 The model of hierarchical temporal memory
(HTM) is another variant and extension of the
CNN. The extension includes the following
aspects:
(1) Time or temporal dimension is introduced to
serve as the “supervision” information for
discrimination (even for static images);
(2) Both bottom-up and top-down information
flows are used, instead of just bottom-up in the
CNN; and
(3) A Bayesian probabilistic formalism is used for
fusing information and for decision making
 A typical HTM network is a tree-shaped
hierarchy of levels that are composed of
smaller elements called nodes or columns.
 A single level in the hierarchy is also called a
region.
 Higher hierarchy levels often have fewer
nodes and therefore less spatial resolvability.
 Higher hierarchy levels can reuse patterns
learned at the lower levels by combining
them to memorize more complex patterns.
An example of HTM hierarchy used for image recognition
 The learning architecture developed for
bottom-up, detection-based speech
recognition proposed and developed further
since 2004, notably using the DBN–DNN
technique, can also be categorized in the
discriminative or supervised learning deep
architecture category.
 The term “hybrid” generally indicates result of
something combined.
 For this category, it refers to the deep
architecture that either comprises or makes
use of both generative and discriminative
model components.
 In the existing hybrid architectures, the
generative component is mostly exploited to
help with discrimination, which is the goal of
the hybrid architecture.
 The optimization viewpoint where generative
models trained in an unsupervised fashion
can provide excellent initialization points in
highly nonlinear parameter estimation
problems.
 The regularization perspective where the
unsupervised-learning models can effectively
provide a prior on the set of functions
representable by the model.
 DBN, a generative, deep network for
unsupervised learning discussed earlier, can
be converted to and used as the initial model
of a DNN for supervised learning with the
same network structure, which is further
discriminatively trained or fine-tuned using
the target labels provided.
 We can consider this as a DBN-DNN model,
where model trained using unsupervised data
helps to make the discriminative model
effective for supervised learning.
 Another example of the hybrid deep network is
developed in, where the DNN weights are also
initialized from a generative DBN but are further
fine-tuned with a sequence-level discriminative
criterion, which is the conditional probability of
the label sequence given the input feature
sequence, instead of the frame-level criterion of
cross entropy commonly used.
 It can be shown that such a DNN–CRF is
equivalent to a hybrid deep architecture of DNN
and HMM i.e., DNN-HMM.
 Another example of hybrid deep networks is the
use of generative models of DBNs to pre-train
deep convolutional neural networks.
 Like the fully connected DNN discussed earlier,
pre-training also helps to improve the
performance of deep CNNs over random
initialization.
 Pre-training DNNs or CNNs using a set of
regularized deep auto-encoders , including
de-noising auto-encoders, contractive auto
encoders, and sparse auto-encoders, is also a
similar example of hybrid deep networks.
THANK YOU

More Related Content

PDF
PID controller in control systems
khalaf Gaeid
 
PPTX
HEALTH INSURANCE PRESENTATION
Sandeep Mane
 
PPTX
Temperature Control System Using Pid Controller
Masum Parvej
 
PDF
Introduction to AI & ML
Mandy Sidana
 
PPTX
Deep learning.pptx
MdMahfoozAlam5
 
PPTX
Deep Learning Explained
Melanie Swan
 
PPTX
Deep learning presentation
Tunde Ajose-Ismail
 
PPT
Chapter 01 software engineering pressman
RohitGoyal183
 
PID controller in control systems
khalaf Gaeid
 
HEALTH INSURANCE PRESENTATION
Sandeep Mane
 
Temperature Control System Using Pid Controller
Masum Parvej
 
Introduction to AI & ML
Mandy Sidana
 
Deep learning.pptx
MdMahfoozAlam5
 
Deep Learning Explained
Melanie Swan
 
Deep learning presentation
Tunde Ajose-Ismail
 
Chapter 01 software engineering pressman
RohitGoyal183
 

What's hot (20)

PPTX
Semantic net in AI
ShahDhruv21
 
PPTX
Color Image Processing
kiruthiammu
 
PDF
Web Security
Dr.Florence Dayana
 
PPTX
Text compression
Sammer Qader
 
PDF
Code optimization in compiler design
Kuppusamy P
 
PPTX
And or graph
Ali A Jalil
 
PPTX
Principal source of optimization in compiler design
Rajkumar R
 
PPTX
Secure Hash Algorithm
Vishakha Agarwal
 
PPTX
Analytical learning
swapnac12
 
PPT
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
ssuser5c9d4b1
 
PDF
Intruders
Dr.Florence Dayana
 
PPTX
The impact of web on ir
Primya Tamil
 
PPTX
Lexical analyzer generator lex
Anusuya123
 
PDF
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
TEJVEER SINGH
 
PPT
Deadlock detection and recovery by saad symbian
saad symbian
 
PPT
Intermediate code generation (Compiler Design)
Tasif Tanzim
 
PPTX
Pipelining and vector processing
Kamal Acharya
 
PDF
Array Processor
Anshuman Biswal
 
PPT
Algorithm analysis
sumitbardhan
 
PPTX
Introdution and designing a learning system
swapnac12
 
Semantic net in AI
ShahDhruv21
 
Color Image Processing
kiruthiammu
 
Web Security
Dr.Florence Dayana
 
Text compression
Sammer Qader
 
Code optimization in compiler design
Kuppusamy P
 
And or graph
Ali A Jalil
 
Principal source of optimization in compiler design
Rajkumar R
 
Secure Hash Algorithm
Vishakha Agarwal
 
Analytical learning
swapnac12
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
ssuser5c9d4b1
 
The impact of web on ir
Primya Tamil
 
Lexical analyzer generator lex
Anusuya123
 
Design principle of pattern recognition system and STATISTICAL PATTERN RECOGN...
TEJVEER SINGH
 
Deadlock detection and recovery by saad symbian
saad symbian
 
Intermediate code generation (Compiler Design)
Tasif Tanzim
 
Pipelining and vector processing
Kamal Acharya
 
Array Processor
Anshuman Biswal
 
Algorithm analysis
sumitbardhan
 
Introdution and designing a learning system
swapnac12
 
Ad

Similar to Three classes of deep learning networks (20)

PPT
deeplearning
huda2018
 
PPTX
Basics of Deep learning
Ramesh Kumar
 
PPTX
DEEP LEARNING (UNIT 2 ) by surbhi saroha
Dr. SURBHI SAROHA
 
DOCX
Title_ Deep Learning Explained_ What You Should Be Aware of in Data Science a...
nnibedita021
 
PPT
Deep Learning
Roshan Chettri
 
PPTX
Deep learning intro and examples and types
JavedKhan524377
 
PDF
A Survey of Deep Learning Algorithms for Malware Detection
IJCSIS Research Publications
 
PPTX
Introduction-to-Deep-Learning about new technologies
sindhibharat567
 
PPTX
BASIC CONCEPT OF DEEP LEARNING.pptx
RiteshPandey184067
 
PDF
AINL 2016: Filchenkov
Lidia Pivovarova
 
PPTX
A simple presentation for deep learning.
mahfuzur32785
 
PDF
Top 10 deep learning algorithms you should know in
AmanKumarSingh97
 
PDF
Occurrence Prediction_NLP
Guttenberg Ferreira Passos
 
PPTX
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Sharmila Sathish
 
PDF
Model Evaluation in the land of Deep Learning
Pramit Choudhary
 
PPTX
let's dive to deep learning
Mohamed Essam
 
PDF
Looking into the Black Box - A Theoretical Insight into Deep Learning Networks
Dinesh V
 
PPTX
Handwritten bangla-digit-recognition-using-deep-learning
Sharmin Rubi
 
PPTX
MaLAI_Hyderabad presentation
Gurram Poorna Prudhvi
 
PPTX
Discover How Scientific Data is Used for the Public Good with Natural Languag...
BaoTramDuong2
 
deeplearning
huda2018
 
Basics of Deep learning
Ramesh Kumar
 
DEEP LEARNING (UNIT 2 ) by surbhi saroha
Dr. SURBHI SAROHA
 
Title_ Deep Learning Explained_ What You Should Be Aware of in Data Science a...
nnibedita021
 
Deep Learning
Roshan Chettri
 
Deep learning intro and examples and types
JavedKhan524377
 
A Survey of Deep Learning Algorithms for Malware Detection
IJCSIS Research Publications
 
Introduction-to-Deep-Learning about new technologies
sindhibharat567
 
BASIC CONCEPT OF DEEP LEARNING.pptx
RiteshPandey184067
 
AINL 2016: Filchenkov
Lidia Pivovarova
 
A simple presentation for deep learning.
mahfuzur32785
 
Top 10 deep learning algorithms you should know in
AmanKumarSingh97
 
Occurrence Prediction_NLP
Guttenberg Ferreira Passos
 
Feature Extraction and Analysis of Natural Language Processing for Deep Learn...
Sharmila Sathish
 
Model Evaluation in the land of Deep Learning
Pramit Choudhary
 
let's dive to deep learning
Mohamed Essam
 
Looking into the Black Box - A Theoretical Insight into Deep Learning Networks
Dinesh V
 
Handwritten bangla-digit-recognition-using-deep-learning
Sharmin Rubi
 
MaLAI_Hyderabad presentation
Gurram Poorna Prudhvi
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
BaoTramDuong2
 
Ad

Recently uploaded (20)

PPTX
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
PPTX
Virus sequence retrieval from NCBI database
yamunaK13
 
PDF
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
PPTX
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
PPTX
Measures_of_location_-_Averages_and__percentiles_by_DR SURYA K.pptx
Surya Ganesh
 
PDF
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
PPTX
CDH. pptx
AneetaSharma15
 
PDF
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
PPTX
A Smarter Way to Think About Choosing a College
Cyndy McDonald
 
PPTX
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
PPTX
PROTIEN ENERGY MALNUTRITION: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
PPTX
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
PPTX
Care of patients with elImination deviation.pptx
AneetaSharma15
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PPTX
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
PPTX
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
PPTX
Kanban Cards _ Mass Action in Odoo 18.2 - Odoo Slides
Celine George
 
PPTX
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
PPTX
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
PPTX
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
Virus sequence retrieval from NCBI database
yamunaK13
 
Antianginal agents, Definition, Classification, MOA.pdf
Prerana Jadhav
 
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
Measures_of_location_-_Averages_and__percentiles_by_DR SURYA K.pptx
Surya Ganesh
 
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
CDH. pptx
AneetaSharma15
 
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
A Smarter Way to Think About Choosing a College
Cyndy McDonald
 
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
PROTIEN ENERGY MALNUTRITION: NURSING MANAGEMENT.pptx
PRADEEP ABOTHU
 
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
Care of patients with elImination deviation.pptx
AneetaSharma15
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
Introduction to pediatric nursing in 5th Sem..pptx
AneetaSharma15
 
Kanban Cards _ Mass Action in Odoo 18.2 - Odoo Slides
Celine George
 
INTESTINALPARASITES OR WORM INFESTATIONS.pptx
PRADEEP ABOTHU
 
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 

Three classes of deep learning networks

  • 2.  Deep learning refers to a rather wide class of machine learning techniques and architectures, with the hallmark of using many layers of non- linear information processing that are hierarchical in nature.  Deep learning or hierarchical learning, has emerged as a new area of machine learning research.  The techniques developed from deep learning research have already been impacting a wide range of signal and information processing work.
  • 3.  Deep learning has various closely related definitions or high-level descriptions:  DEFINITION-1: A class of machine learning techniques that exploit many layers of non-linear information processing for supervised or unsupervised feature extraction and transformation, and for pattern analysis and classification.
  • 4.  NOTE: The deep learning that we discuss in this monograph is about learning with deep architectures for signal and information processing. It is not about deep understanding of the signal or information.
  • 5.  Depending on how the architectures and techniques are intended for use, e.g., synthesis/generation or recognition/ classification, one can broadly categorize most of the work in this area into three major classes:  1) Deep networks for unsupervised or generative learning. 2) Deep networks for supervised learning. 3) Hybrid Deep networks.
  • 6.  Unsupervised learning refers to no use of task specific supervision information (e.g., target class labels) in the learning process.  Deep networks in this category can be either generative or non generative in nature.  Most of them are generative in nature(with examples of RBMs, DBNs, DBMs, and generalized de-noising auto-encoders ).  Among the various subclasses of generative or unsupervised deep networks, the energy- based deep models are the most common.
  • 7.  Deep belief network (DBN): Probabilistic generative models composed of multiple layers of stochastic, hidden variables. The top two layers have undirected, symmetric connections between them. The lower layers receive top-down, directed connections from the layer above. Boltzmann machine (BM): A network of symmetrically connected, neuron-like units that make stochastic decisions about whether to be on or off. Restricted Boltzmann machine (RBM): A special type of BM consisting of a layer of visible units and a layer of hidden units with no visible-visible or hidden-hidden connections.
  • 9.  Deep neural network (DNN): A multilayer perceptron with many hidden layers, whose weights are fully connected and are often (although not always) initialized using either an unsupervised or a supervised pre-training technique. (In the literature prior to 2012, a DBN was often used incorrectly to mean a DNN.)  Deep auto-encoder: A “discriminative” DNN whose output targets are the data input itself rather than class labels; hence an unsupervised learning model. When trained with a de-noising criterion, a deep auto-encoder is also a generative model and can be sampled from.
  • 11.  Another prominent type of deep unsupervised models with generative capability is the deep Boltzmann machine or DBM.A DBM contains many layers of hidden variables, and has no connections between the variables within the same layer.  When the number of hidden layers of DBM is reduced to one, we have restricted Boltzmann machine(RBM).
  • 12.  Another representative deep generative network that can be used for unsupervised (as well as supervised) learning is the sum– product network or SPN.  An SPN is a directed acyclic graph with the observed variables as leaves, and with sum and product operations as internal nodes in the deep network.
  • 14.  Recurrent neural networks (RNNs) can be considered as another class of deep networks for unsupervised(as well as supervised) learning, where the depth can be as large as the length of the input data sequence.  RNN is used to predict the data sequence in the future using the previous data samples.
  • 15.  Deep generative graphical models are a powerful tool in many applications due to their capability of embedding domain knowledge.  However, they are often used with inappropriate approximations in inference, learning, prediction, and topology design, all arising from inherent intractability in these tasks for most real-world applications.
  • 16.  Finally, dynamic or temporally recursive generative models based on neural network architectures can be found in for human motion modeling, and in for natural language and natural scene parsing
  • 17.  Many of the discriminative techniques for supervised learning in signal and information processing are shallow architectures such as HMM’s and conditional random fields (CRFs).  A CRF is intrinsically a shallow discriminative architecture, characterized by the linear relationship between the input features and the transition features.  Deep-structured CRFs have been developed by stacking the output in each lower layer of the CRF, together with the original input data, onto its higher layer.
  • 18.  Other major existing discriminative models in speech recognition based mainly on the traditional neural network or MLP architecture using back-propagation learning with random initialization.  It argues for the importance of both the increased width of each layer of the neural networks and the increased depth.
  • 19.  A new deep learning architecture, sometimes called deep stacking network (DSN), together with its tensor variant and its kernel version, are developed that all focus on discrimination with scalable, parallelizable, block-wise learning relying on little or no generative component.
  • 21.  Another type of discriminative deep architecture is the convolutional neural network (CNN), in which each module consists of a convolutional layer and a pooling layer. These modules are often stacked up with one on top of another, or with a DNN on top of it, to form a deep model  The weight sharing in the convolutional layer, together with appropriately chosen pooling schemes, endows the CNN with some “invariance” properties .
  • 23.  Nevertheless, CNNs have been found highly effective and been commonly used in computer vision and image recognition. More recently, with appropriate changes from the CNN designed for image analysis to that taking into account speech-specific properties, the CNN is also found effective for speech recognition
  • 24.  The Time-delay neural network (TDNN) developed for early speech recognition is a special case and predecessor of the CNN when weight sharing is limited to one of the two dimensions, i.e., time dimension, and there is no pooling layer
  • 25.  The model of hierarchical temporal memory (HTM) is another variant and extension of the CNN. The extension includes the following aspects: (1) Time or temporal dimension is introduced to serve as the “supervision” information for discrimination (even for static images); (2) Both bottom-up and top-down information flows are used, instead of just bottom-up in the CNN; and (3) A Bayesian probabilistic formalism is used for fusing information and for decision making
  • 26.  A typical HTM network is a tree-shaped hierarchy of levels that are composed of smaller elements called nodes or columns.  A single level in the hierarchy is also called a region.  Higher hierarchy levels often have fewer nodes and therefore less spatial resolvability.  Higher hierarchy levels can reuse patterns learned at the lower levels by combining them to memorize more complex patterns.
  • 27. An example of HTM hierarchy used for image recognition
  • 28.  The learning architecture developed for bottom-up, detection-based speech recognition proposed and developed further since 2004, notably using the DBN–DNN technique, can also be categorized in the discriminative or supervised learning deep architecture category.
  • 29.  The term “hybrid” generally indicates result of something combined.  For this category, it refers to the deep architecture that either comprises or makes use of both generative and discriminative model components.  In the existing hybrid architectures, the generative component is mostly exploited to help with discrimination, which is the goal of the hybrid architecture.
  • 30.  The optimization viewpoint where generative models trained in an unsupervised fashion can provide excellent initialization points in highly nonlinear parameter estimation problems.  The regularization perspective where the unsupervised-learning models can effectively provide a prior on the set of functions representable by the model.
  • 31.  DBN, a generative, deep network for unsupervised learning discussed earlier, can be converted to and used as the initial model of a DNN for supervised learning with the same network structure, which is further discriminatively trained or fine-tuned using the target labels provided.  We can consider this as a DBN-DNN model, where model trained using unsupervised data helps to make the discriminative model effective for supervised learning.
  • 32.  Another example of the hybrid deep network is developed in, where the DNN weights are also initialized from a generative DBN but are further fine-tuned with a sequence-level discriminative criterion, which is the conditional probability of the label sequence given the input feature sequence, instead of the frame-level criterion of cross entropy commonly used.  It can be shown that such a DNN–CRF is equivalent to a hybrid deep architecture of DNN and HMM i.e., DNN-HMM.
  • 33.  Another example of hybrid deep networks is the use of generative models of DBNs to pre-train deep convolutional neural networks.  Like the fully connected DNN discussed earlier, pre-training also helps to improve the performance of deep CNNs over random initialization.  Pre-training DNNs or CNNs using a set of regularized deep auto-encoders , including de-noising auto-encoders, contractive auto encoders, and sparse auto-encoders, is also a similar example of hybrid deep networks.