Three classes of deep learning networks

 Deep learning refers to a rather wide class of
machine learning techniques and architectures,
with the hallmark of using many layers of non-
linear information processing that are
hierarchical in nature.
 Deep learning or hierarchical learning, has
emerged as a new area of machine learning
research.
 The techniques developed from deep learning
research have already been impacting a wide
range of signal and information processing work.

 Deep learning has various closely related
deﬁnitions or high-level descriptions:
 DEFINITION-1:
A class of machine learning techniques that
exploit many layers of non-linear information
processing for supervised or unsupervised
feature extraction and transformation, and for
pattern analysis and classiﬁcation.

 NOTE:
The deep learning that we discuss in this
monograph is about learning with deep
architectures for signal and information
processing. It is not about deep understanding
of the signal or information.

 Depending on how the architectures and
techniques are intended for use, e.g.,
synthesis/generation or recognition/
classiﬁcation, one can broadly categorize
most of the work in this area into three major
classes:
 1) Deep networks for unsupervised or
generative learning.
2) Deep networks for supervised learning.
3) Hybrid Deep networks.

 Unsupervised learning refers to no use of
task speciﬁc supervision information (e.g.,
target class labels) in the learning process.
 Deep networks in this category can be either
generative or non generative in nature.
 Most of them are generative in nature(with
examples of RBMs, DBNs, DBMs, and
generalized de-noising auto-encoders ).
 Among the various subclasses of generative
or unsupervised deep networks, the energy-
based deep models are the most common.

 Deep belief network (DBN):
Probabilistic generative models composed of multiple
layers of stochastic, hidden variables. The top two
layers have undirected, symmetric connections between
them. The lower layers receive top-down, directed
connections from the layer above.
Boltzmann machine (BM):
A network of symmetrically connected, neuron-like
units that make stochastic decisions about whether to
be on or oﬀ.
Restricted Boltzmann machine (RBM):
A special type of BM consisting of a layer of visible
units and a layer of hidden units with no visible-visible
or hidden-hidden connections.

Deep Belief Network
Representation
Boltzmann
machine
Restricted
Boltzmann Machine

 Deep neural network (DNN):
A multilayer perceptron with many hidden layers,
whose weights are fully connected and are often
(although not always) initialized using either an
unsupervised or a supervised pre-training
technique. (In the literature prior to 2012, a DBN
was often used incorrectly to mean a DNN.)
 Deep auto-encoder:
A “discriminative” DNN whose output
targets are the data input itself rather than class
labels; hence an unsupervised learning model.
When trained with a de-noising criterion, a deep
auto-encoder is also a generative model and can
be sampled from.

 Another prominent type of deep
unsupervised models with generative
capability is the deep Boltzmann machine or
DBM.A DBM contains many layers of hidden
variables, and has no connections between
the variables within the same layer.
 When the number of hidden layers of DBM is
reduced to one, we have restricted Boltzmann
machine(RBM).

 Another representative deep generative
network that can be used for unsupervised
(as well as supervised) learning is the sum–
product network or SPN.
 An SPN is a directed acyclic graph with the
observed variables as leaves, and with sum
and product operations as internal nodes in
the deep network.

 Recurrent neural networks (RNNs) can be
considered as another class of deep networks
for unsupervised(as well as supervised)
learning, where the depth can be as large as
the length of the input data sequence.
 RNN is used to predict the data sequence in
the future using the previous data samples.

 Deep generative graphical models are a
powerful tool in many applications due to
their capability of embedding domain
knowledge.
 However, they are often used with
inappropriate approximations in inference,
learning, prediction, and topology design, all
arising from inherent intractability in these
tasks for most real-world applications.

 Finally, dynamic or temporally recursive
generative models based on neural network
architectures can be found in for human
motion modeling, and in for natural language
and natural scene parsing

 Many of the discriminative techniques for
supervised learning in signal and information
processing are shallow architectures such as
HMM’s and conditional random ﬁelds (CRFs).
 A CRF is intrinsically a shallow discriminative
architecture, characterized by the linear
relationship between the input features and the
transition features.
 Deep-structured CRFs have been developed by
stacking the output in each lower layer of the
CRF, together with the original input data, onto
its higher layer.

 Other major existing discriminative models
in speech recognition based mainly on the
traditional neural network or MLP architecture
using back-propagation learning with
random initialization.
 It argues for the importance of both the
increased width of each layer of the neural
networks and the increased depth.

 A new deep learning architecture, sometimes
called deep stacking network (DSN), together
with its tensor variant and its kernel version,
are developed that all focus on discrimination
with scalable, parallelizable, block-wise
learning relying on little or no generative
component.

 Another type of discriminative deep
architecture is the convolutional neural
network (CNN), in which each module
consists of a convolutional layer and a
pooling layer. These modules are often
stacked up with one on top of another, or
with a DNN on top of it, to form a deep model
 The weight sharing in the convolutional
layer, together with appropriately chosen
pooling schemes, endows the CNN with some
“invariance” properties .

 Nevertheless, CNNs have been found highly
effective and been commonly used in
computer vision and image recognition. More
recently, with appropriate changes from the
CNN designed for image analysis to that
taking into account speech-specific
properties, the CNN is also found effective for
speech recognition

 The Time-delay neural network (TDNN)
developed for early speech recognition is a
special case and predecessor of the CNN
when weight sharing is limited to one of the
two dimensions, i.e., time dimension, and
there is no pooling layer

 The model of hierarchical temporal memory
(HTM) is another variant and extension of the
CNN. The extension includes the following
aspects:
(1) Time or temporal dimension is introduced to
serve as the “supervision” information for
discrimination (even for static images);
(2) Both bottom-up and top-down information
ﬂows are used, instead of just bottom-up in the
CNN; and
(3) A Bayesian probabilistic formalism is used for
fusing information and for decision making

 A typical HTM network is a tree-shaped
hierarchy of levels that are composed of
smaller elements called nodes or columns.
 A single level in the hierarchy is also called a
region.
 Higher hierarchy levels often have fewer
nodes and therefore less spatial resolvability.
 Higher hierarchy levels can reuse patterns
learned at the lower levels by combining
them to memorize more complex patterns.

An example of HTM hierarchy used for image recognition

 The learning architecture developed for
bottom-up, detection-based speech
recognition proposed and developed further
since 2004, notably using the DBN–DNN
technique, can also be categorized in the
discriminative or supervised learning deep
architecture category.

 The term “hybrid” generally indicates result of
something combined.
 For this category, it refers to the deep
architecture that either comprises or makes
use of both generative and discriminative
model components.
 In the existing hybrid architectures, the
generative component is mostly exploited to
help with discrimination, which is the goal of
the hybrid architecture.

 The optimization viewpoint where generative
models trained in an unsupervised fashion
can provide excellent initialization points in
highly nonlinear parameter estimation
problems.
 The regularization perspective where the
unsupervised-learning models can eﬀectively
provide a prior on the set of functions
representable by the model.

 DBN, a generative, deep network for
unsupervised learning discussed earlier, can
be converted to and used as the initial model
of a DNN for supervised learning with the
same network structure, which is further
discriminatively trained or ﬁne-tuned using
the target labels provided.
 We can consider this as a DBN-DNN model,
where model trained using unsupervised data
helps to make the discriminative model
eﬀective for supervised learning.

 Another example of the hybrid deep network is
developed in, where the DNN weights are also
initialized from a generative DBN but are further
ﬁne-tuned with a sequence-level discriminative
criterion, which is the conditional probability of
the label sequence given the input feature
sequence, instead of the frame-level criterion of
cross entropy commonly used.
 It can be shown that such a DNN–CRF is
equivalent to a hybrid deep architecture of DNN
and HMM i.e., DNN-HMM.

 Another example of hybrid deep networks is the
use of generative models of DBNs to pre-train
deep convolutional neural networks.
 Like the fully connected DNN discussed earlier,
pre-training also helps to improve the
performance of deep CNNs over random
initialization.
 Pre-training DNNs or CNNs using a set of
regularized deep auto-encoders , including
de-noising auto-encoders, contractive auto
encoders, and sparse auto-encoders, is also a
similar example of hybrid deep networks.

Three classes of deep learning networks

More Related Content

What's hot (20)

Similar to Three classes of deep learning networks (20)

Recently uploaded (20)

Three classes of deep learning networks