0% found this document useful (0 votes)
27 views17 pages

Deep & Reinforcement - Unit 4

This document provides an overview of Recurrent Neural Networks (RNNs), detailing their structure, advantages, disadvantages, and various types, including Long Short-Term Memory (LSTM) networks. RNNs are designed to handle sequential data and retain memory of past inputs, making them suitable for applications like natural language processing and time series forecasting. The document also addresses challenges such as vanishing gradients and highlights the functionality of LSTMs in overcoming these issues.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views17 pages

Deep & Reinforcement - Unit 4

This document provides an overview of Recurrent Neural Networks (RNNs), detailing their structure, advantages, disadvantages, and various types, including Long Short-Term Memory (LSTM) networks. RNNs are designed to handle sequential data and retain memory of past inputs, making them suitable for applications like natural language processing and time series forecasting. The document also addresses challenges such as vanishing gradients and highlights the functionality of LSTMs in overcoming these issues.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

DEEP & REINFORCEMENT NETWORKS (ITITE63)

UNIT 4
Recurrent Neural Networks - Recurrent Neural network model, Different types of
RNNs, vanishing gradients with RNN, LSTM, Gated Recurrent units, Bidirectional
LSTM.

What Is a Recurrent Neural Network (RNN)?


RNN works on the principle of saving the output of a particular layer and feeding this back to
the input in order to predict the output of the layer.

Below is how you can convert a Feed-Forward Neural Network into a Recurrent Neural
Network:

Fig: Simple Recurrent Neural Network

The nodes in different layers of the neural network are compressed to form a single layer of
recurrent neural networks. A, B, and C are the parameters of the network.
Fig: Fully connected Recurrent Neural Network

Here, “x” is the input layer, “h” is the hidden layer, and “y” is the output layer. A, B, and C are
the network parameters used to improve the output of the model. At any given time t, the
current input is a combination of input at x(t) and x(t-1). The output at any given time is fetched
back to the network to improve on the output.
Fig: Fully connected Recurrent Neural Network

Why Recurrent Neural Networks?

RNN were created because there were a few issues in the feed-forward neural network:

• Cannot handle sequential data

• Considers only the current input

• Cannot memorize previous inputs

The solution to these issues is the RNN. An RNN can handle sequential data, accepting the
current input data, and previously receive inputs. RNNs can memorize previous inputs due to
their internal memory.

How Does Recurrent Neural Networks Work?

In Recurrent Neural networks, the information cycles through a loop to the middle-hidden
layer.
Fig: Working of Recurrent Neural Network

The input layer ‘x’ takes in the input to the neural network and processes it and passes it onto
the middle layer.

The middle layer ‘h’ can consist of multiple hidden layers, each with its own activation
functions and weights and biases. If you have a neural network where the various parameters
of different hidden layers are not affected by the previous layer, ie: the neural network does not
have memory, then you can use a recurrent neural network.

The Recurrent Neural Network will standardize the different activation functions and weights
and biases so that each hidden layer has the same parameters. Then, instead of creating multiple
hidden layers, it will create one and loop over it as many times as required.

Feed-Forward Neural Networks vs Recurrent Neural Networks

A feed-forward neural network allows information to flow only in the forward direction, from
the input nodes, through the hidden layers, and to the output nodes. There are no cycles or loops
in the network.

Below is how a simplified presentation of a feed-forward neural network looks like:


Fig: Feed-forward Neural Network

In a feed-forward neural network, the decisions are based on the current input. It doesn’t
memorize the past data, and there’s no future scope. Feed-forward neural networks are used in
general regression and classification problems.

Applications of Recurrent Neural Networks

1. Image Captioning

RNNs are used to caption an image by analysing the activities present.

2. Time Series Prediction

Any time series problem, like predicting the prices of stocks in a particular month, can be
solved using an RNN.

3. Natural Language Processing

Text mining and Sentiment analysis can be carried out using an RNN for Natural Language
Processing (NLP).

Advantages of Recurrent Neural Network

Recurrent Neural Networks (RNNs) have several advantages over other types of neural
networks, including:

1. Ability To Handle Variable-Length Sequences

RNNs are designed to handle input sequences of variable length, which makes them well-suited
for tasks such as speech recognition, natural language processing, and time series analysis.
2. Memory Of Past Inputs

RNNs have a memory of past inputs, which allows them to capture information about the
context of the input sequence. This makes them useful for tasks such as language modeling,
where the meaning of a word depends on the context in which it appears.

3. Parameter Sharing

RNNs share the same set of parameters across all time steps, which reduces the number of
parameters that need to be learned and can lead to better generalization.

4. Non-Linear Mapping

RNNs use non-linear activation functions, which allows them to learn complex, non-linear
mappings between inputs and outputs.

5. Sequential Processing

RNNs process input sequences sequentially, which makes them computationally efficient and
easy to parallelize.

6. Flexibility

RNNs can be adapted to a wide range of tasks and input types, including text, speech, and
image sequences.

7. Improved Accuracy

RNNs have been shown to achieve state-of-the-art performance on a variety of sequence


modeling tasks, including language modeling, speech recognition, and machine translation.

These advantages make RNNs a powerful tool for sequence modeling and analysis, and have
led to their widespread use in a variety of applications, including natural language processing,
speech recognition, and time series analysis.

Disadvantages of Recurrent Neural Network


Although Recurrent Neural Networks (RNNs) have several advantages, they also have some
disadvantages. Here are some of the main disadvantages of RNNs:

1. Vanishing And Exploding Gradients

RNNs can suffer from the problem of vanishing or exploding gradients, which can make it
difficult to train the network effectively. This occurs when the gradients of the loss function
with respect to the parameters become very small or very large as they propagate through time.

2. Computational Complexity

RNNs can be computationally expensive to train, especially when dealing with long sequences.
This is because the network has to process each input in sequence, which can be slow.

3. Difficulty In Capturing Long-Term Dependencies

Although RNNs are designed to capture information about past inputs, they can struggle to
capture long-term dependencies in the input sequence. This is because the gradients can
become very small as they propagate through time, which can cause the network to forget
important information.

4. Lack Of Parallelism

RNNs are inherently sequential, which makes it difficult to parallelize the computation. This
can limit the speed and scalability of the network.

5. Difficulty In Choosing the Right Architecture

There are many different variants of RNNs, each with its own advantages and disadvantages.
Choosing the right architecture for a given task can be challenging, and may require extensive
experimentation and tuning.

6. Difficulty In Interpreting the Output

The output of an RNN can be difficult to interpret, especially when dealing with complex inputs
such as natural language or audio. This can make it difficult to understand how the network is
making its predictions.
These disadvantages are important when deciding whether to use an RNN for a given task.
However, many of these issues can be addressed through careful design and training of the
network and through techniques such as regularization and attention mechanisms.

Types of Recurrent Neural Networks

There are four types of Recurrent Neural Networks:

1. One to One

2. One to Many

3. Many to One

4. Many to Many

One to One RNN

This type of neural network is known as the Vanilla Neural Network. It's used for general
machine learning problems, which has a single input and a single output.
One to Many RNN

This type of neural network has a single input and multiple outputs. An example of this is the
image caption.

Many to One RNN

This RNN takes a sequence of inputs and generates a single output. Sentiment analysis is a
good example of this kind of network where a given sentence can be classified as expressing
positive or negative sentiments.
Many to Many RNN

This RNN takes a sequence of inputs and generates a sequence of outputs. Machine translation
is one of the examples.
Two Issues of Standard RNNs

1. Vanishing Gradient Problem

Recurrent Neural Networks enable you to model time-dependent and sequential data problems,
such as stock market prediction, machine translation, and text generation. You will find,
however, RNN is hard to train because of the gradient problem.

RNNs suffer from the problem of vanishing gradients. The gradients carry information used in
the RNN, and when the gradient becomes too small, the parameter updates become
insignificant. This makes the learning of long data sequences difficult.

2. Exploding Gradient Problem

While training a neural network, if the slope tends to grow exponentially instead of decaying,
this is called an Exploding Gradient. This problem arises when large error gradients
accumulate, resulting in very large updates to the neural network model weights during the
training process.

Long training time, poor performance, and bad accuracy are the major issues in gradient
problems.

Gradient Problem Solutions


Now, the most popular and efficient way to deal with gradient problems, i.e., Long Short-Term
Memory Network (LSTMs).

First, let’s understand Long-Term Dependencies.

Suppose you want to predict the last word in the text: “The clouds are in the ______.”

The most obvious answer to this is the “sky.” We do not need any further context to predict the
last word in the above sentence.

Consider this sentence: “I have been staying in Spain for the last 10 years…I can speak fluent
______.”

The word you predict will depend on the previous few words in context. Here, you need the
context of Spain to predict the last word in the text, and the most suitable answer to this
sentence is “Spanish.” The gap between the relevant information and the point where it's
needed may have become very large. LSTMs help you solve this problem.

Common Activation Functions

Recurrent Neural Networks (RNNs) use activation functions just like other neural networks to
introduce non-linearity to their models. Here are some common activation functions used in
RNNs:

Sigmoid Function:

The sigmoid function is commonly used in RNNs. It has a range between 0 and 1, which makes
it useful for binary classification tasks. The formula for the sigmoid function is:

σ(x) = 1 / (1 + e^(-x))
Hyperbolic Tangent (Tanh) Function:

The tanh function is also commonly used in RNNs. It has a range between -1 and 1, which
makes it useful for non-linear classification tasks. The formula for the tanh function is:

tanh(x) = (e^x - e^(-x)) / (e^x + e^(-x))

Rectified Linear Unit (Relu) Function:

The ReLU function is a non-linear activation function that is widely used in deep neural
networks. It has a range between 0 and infinity, which makes it useful for models that require
positive outputs. The formula for the ReLU function is:

ReLU(x) = max(0, x)

Leaky Relu Function:

The Leaky ReLU function is similar to the ReLU function, but it introduces a small slope to
negative values, which helps to prevent "dead neurons" in the model. The formula for the Leaky
ReLU function is:

Leaky ReLU(x) = max(0.01x, x)

Softmax Function:

The softmax function is often used in the output layer of RNNs for multi-class classification
tasks. It converts the network output into a probability distribution over the possible classes.
The formula for the softmax function is:

softmax(x) = e^x / ∑(e^x)

These are just a few examples of the activation functions used in RNNs. The choice of
activation function depends on the specific task and the model's architecture.

Introduction to Long Short-Term Memory

➢ Long Short-Term Memory is a kind of recurrent neural network. In RNN output from
the last step is fed as input in the current step.
➢ LSTM was designed by Hochreiter & Schmidhuber. It tackled the problem of long-
term dependencies of RNN in which the RNN cannot predict the word stored in the
long-term memory but can give more accurate predictions from the recent
information.
➢ As the gap length increases RNN does not give an efficient performance. LSTM can
by default retain the information for a long period of time. It is used for processing,
predicting, and classifying on the basis of time-series data.
➢ Long Short-Term Memory (LSTM) is a type of Recurrent Neural Network (RNN)
that is specifically designed to handle sequential data, such as time series, speech, and
text.
➢ LSTM networks are capable of learning long-term dependencies in sequential data,
which makes them well suited for tasks such as language translation, speech
recognition, and time series forecasting.
A traditional RNN has a single hidden state that is passed through time, which can make it
difficult for the network to learn long-term dependencies. LSTMs address this problem by
introducing a memory cell, which is a container that can hold information for an extended
period of time. The memory cell is controlled by three gates: the input gate, the forget gate,
and the output gate. These gates decide what information to add to, remove from, and output
from the memory cell.
The input gate controls what information is added to the memory cell. The forget gate
controls what information is removed from the memory cell. And the output gate controls
what information is output from the memory cell. This allows LSTM networks to selectively
retain or discard information as it flows through the network, which allows them to learn
long-term dependencies.
LSTMs can be stacked to create deep LSTM networks, which can learn even more complex
patterns in sequential data. LSTMs can also be used in combination with other neural network
architectures, such as Convolutional Neural Networks (CNNs) for image and video analysis.

Structure Of LSTM:

LSTM has a chain structure that contains four neural networks and different memory blocks
called cells.
Information is retained by the cells and the memory manipulations are done by
the gates. There are three gates –

1. Forget Gate: The information that is no longer useful in the cell state is removed with the
forget gate. Two inputs x_t (input at the particular time) and h_t-1 (previous cell output) are
fed to the gate and multiplied with weight matrices followed by the addition of bias. The
resultant is passed through an activation function which gives a binary output. If for a
particular cell state, the output is 0, the piece of information is forgotten and for output 1, the
information is retained for future use.
2. Input gate: The addition of useful information to the cell state is done by the input gate.
First, the information is regulated using the sigmoid function and filter the values to be
remembered similar to the forget gate using inputs h_t-1 and x_t. Then, a vector is created
using tanh function that gives an output from -1 to +1, which contains all the possible values
from h_t-1 and x_t. At last, the values of the vector and the regulated values are multiplied
to obtain the useful information

3. Output gate: The task of extracting useful information from the current cell state to be
presented as output is done by the output gate. First, a vector is generated by applying tanh
function on the cell. Then, the information is regulated using the sigmoid function and filter
by the values to be remembered using inputs h_t-1 and x_t. At last, the values of the vector
and the regulated values are multiplied to be sent as an output and input to the next cell.
Some of the famous applications of LSTM includes:

Long Short-Term Memory (LSTM) is a powerful type of Recurrent Neural Network (RNN)
that has been used in a wide range of applications. Here are a few famous applications of
LSTM:
1. Language Modeling: LSTMs have been used for natural language processing
tasks such as language modeling, machine translation, and text summarization.
They can be trained to generate coherent and grammatically correct sentences by
learning the dependencies between words in a sentence.
2. Speech Recognition: LSTMs have been used for speech recognition tasks such
as transcribing speech to text and recognizing spoken commands. They can be
trained to recognize patterns in speech and match them to the corresponding text.
3. Time Series Forecasting: LSTMs have been used for time series forecasting tasks
such as predicting stock prices, weather, and energy consumption. They can learn
patterns in time series data and use them to make predictions about future events.
4. Anomaly Detection: LSTMs have been used for anomaly detection tasks such as
detecting fraud and network intrusion. They can be trained to identify patterns in
data that deviate from the norm and flag them as potential anomalies.
6. Recommender Systems: LSTMs have been used for recommendation tasks such
as recommending movies, music, and books. They can learn patterns in user behavior
and use them to make personalized recommendations.
7. Video Analysis: LSTMs have been used for video analysis tasks such as object
detection, activity recognition, and action classification. They can be used in
combination with other neural network architectures, such as Convolutional Neural
Networks (CNNs), to analyze video data and extract useful information.

You might also like