ARTIFICIAL
INTELLIGENCE
WHERE DATA MEETS INTELLIGENCE
ANN - IMITATE HUMAN BRAIN!!!
• ANN’s are computational models inspired by
the human brain's structure and function.
• ANN acts as a brain. It has various layers
which are interconnected to each other such
as the input layer and the hidden layer
BIOLOGICAL NEURON
• Biological Neurons are of many different types
• Dendrites receive signals from adjacent neurons, the cell
body processes the input and
• Generates activations that are passed through the axon to
the other neurons.
• Cell can perform complex non-linear computations
• Synapses are not a single weight but a complex non-linear
dynamical system
PERCEPTRON
• A perceptron is the simplest form of an artificial
neuron and the fundamental building block of
neural networks.
• Introduced by Frank Rosenblatt in 1958, it’s a
mathematical model inspired by biological
neurons.
• It’s a binary classifier that makes a decision by
mapping input features to an output (0 or 1).
• Input Features: Multiple features representing
input data characteristics.
• Weights: Each feature is assigned a weight
determining its influence on output.
• Summation Function: Calculates weighted sum of
inputs, combining them with respective weights.
• Activation Function: Passes weighted sum through
Heaviside step function to produce binary output (0
or 1).
Basic Components of Perceptron
• Output: Determined by activation function,
often used for binary classification tasks.
• Bias: Helps make adjustments independent of
input, improving learning flexibility.
• Learning Algorithm: Adjusts weights and bias
using a learning algorithm like Perceptron
Learning Rule.
Basic Components of Perceptron
• Artificial Neuron connected with an input vector x=[x1 x2 x3 x4]
• Input is multiplied with a weight vector w=[w1 w2 w3 w4]
• The weighted input is summed up, weighted sum = x1* w1 + x2* w2 +
x3* w3 + x4* w4
• A bias term 1 is added to form a net sum. Net sum = x1* w1 + x2* w2
+ x3* w3 + x4* w4+ 1
•Net sum is finally passed through an activation function to produce
the final output neuron
How does a Perceptron Work?
UNIT - STEP FUNCTION
A simple activation function to produce a binary output from a given input
n. We can define it as follows, w.r.t a chosen threshold t
f(n) = 0 for n < t = 1, otherwise
Step 1: Compute the weighted
sum:
HOW DOES A PERCEPTRON WORK?
Step 2: Apply activation (step
function):
Training the
Perceptron:
• Initialize: Start with weights W = 0 and bias b =
0
• For Each Data Point:
ACTIVATION FUNCTIONS
Activation functions introduce non-linearity, enabling neural networks to solve complex
problems.
ACTIVATION FUNCTIONS
Sigmoid: sigma(x) = 1/(1+e x)
− outputs 0 to 1, used for binary classification (e.g., spam detection).
Graph shows a smooth S-curve.
Softmax: softmax(xi​
) = exi​
/ (exj​
)​
​
∑ for all j) outputs probabilities (summing to 1), used for multi-class
classification (e.g., image labeling).
Tanh: tanh(x) = 2 / 1 + e—2x outputs -1 to 1, similar to Sigmoid but centered at 0, good for
balanced data. Graph shows a steeper S-curve.
Leaky ReLU: it is a variant of the ReLU that addresses the dying neurons issue.
Leaky ReLU makes provision for non-zero output even for negative values by introducing a small
slope.
ReLU: f(x) = max (0,x) outputs the input value if it is positive or returns zero. In simple words, ReLU
only returns a positive output or zero.
MULTILAYER PERCEPTRON
• The input X is fed into the first layer, which is a
multidimensional perceptron with a weight
matrix W1 and bias vector b1.
• The output of that layer is then fed into the
second layer, which is again a perceptron with
another weight matrix W2 and bias vector b2.
• This process continues for every of the L layers
until we reach the output layer. We refer to the
last layer as the output layer and to every other
layer as a hidden layer.
FORWARD PROPAGATION IN MLP
• An MLP with one hidden layers computes the
function
• an MLP with two hidden layers computes the
function
• Like this, during forward propagation the
activations/output from neurons of one layers
are passed to the neurons of the next layers.
• and, generally, an MLP with L 1 hidden layers
−
computes the function
BACKWARD PROPOGATION IN MLP
• During Backward Propagation (during Training) - error of the loss function is calculated at
the output layer.
• We need to minimize the amount of error made by the neural network. This is done
through an iterative process known as training.
• The error of the loss function can be minimized by updating the network parameters
(weights and biases) of the connections between neurons.
• The gradients of the loss function is calculated w.r.t the weights using Backpropagation
algorithm.
• Applying Gradient Descent, the parameter update is performed (updating weights and
biases).
• This steps are repeated up to a fixed number of iterations or until the difference of errors
in two successive iterations is less than a predetermined threshold (meeting the
EXAMPLE
Net Input to the nodes at the hidden layer:
• h1= 0.81 * 0.5 + 0.12 * 1 + 0.92 * 0.8
• h2= 0.33 * 0.5 + 0.44 * 1 + 0.72 * 0.8
• h3= 0.29 * 0.5 + 0.22 * 1 + 0.53 * 0.8
• h4= 0.37 * 0.5 + 0.12 * 1 + 0.27 * 0.8
COMPUTING NET INPUT TO NEURONS AT HIDDEN LAYER
The image depicts a neural network layer
where biases b1, b2, b3 are added to
hidden nodes h1, h2, h3 alongside
weighted inputs from x1, x2, x3. These
biases shift the activation function, helping
the network better fit the data by adjusting
the output of each neuron.
COMPUTING NET INPUT TO NEURONS AT HIDDEN LAYER
Passes through an activation
function to generate the
output of the neurons.
WEIGHTS BETWEEN INPUT-HIDDEN LAYERS
The connection between input and hidden
layers in a neural network, where inputs
x1, x2, x3 are linked to hidden nodes h1,
h2, h3, h4 via weights. The weight matrix
defines the strength of these
connections, crucial for transforming input
data in the network.
COMPUTING NET INPUT TO NEURONS AT OUTPUT LAYER
The image says the backward propagation in
a neural network, where the error between
predicted outputs o1, o2 and actual targets is
calculated and propagated backward from the
output layer to hidden layers and inputs x1,
x2, x3. This process adjusts weights and
biases to minimize the error, enabling the
network to learn effectively.
COMPUTING NET INPUT TO NEURONS AT OUTPUT LAYER
• The diagram shows an artificial Neuron
connected with an input vector x=[x1 x2 x3 x4]
• The input is multiplied with a weight vector
w=[w1 w2 w3 w4]
• The weighted input is summed up, weighted sum
= x1* w1 + x2* w2 + x3* w3 + x4* w4
• A bias term 1 is added to form a net sum. Net
sum = x1* w1 + x2* w2 + x3* w3 + x4* w4+ 1
• The Net sum is finally passed through an
activation function to produce the final output
neuron
IMPORTANT POINTS TO REMEMBER ABOUT
MLP
MLPs are connectionist computational models
• We can solve classification and regression problems
• MLPs can compose Boolean functions
• MLPs can compose real-valued functions
• MLPs are Universal function approximators (Universal approximation Theorem)
• MLPs can represent any function if
> it is sufficiently wide (number of neurons in a hidden layer)
> it is sufficiently deep (number of hidden layers)
> depth can be traded off for (sometimes) exponential growth of the width of the network
• Optimal width and depth depend on the number of input variables and the complexity of the
function it is trying to model
ANN – REGRESSION AND CLASSIFICATION
In regression, ANNs aim to predict continuous numerical values, while in
classification, they aim to categorize data into discrete classes.
• Goal: To predict a continuous numerical output.
• Example: Predicting house prices, stock prices,
or temperature.
• Output Layer: A linear activation function is
typically used in the output layer.
• Loss Function: Mean Squared Error (MSE) is
commonly used as the loss function.
• Applications: Predicting quantities, estimating
values, and forecasting trends.
REGRESSION
• Goal: To categorize data into discrete classes or
categories.
• Example: Classifying emails as spam or not spam,
recognizing handwritten digits, or identifying
image objects.
• Output Layer: A Softmax activation function is
used to produce probabilities for each class.
• Loss Function: Cross-entropy loss is typically used
as the loss function.
• Applications: Pattern recognition, object
detection, and sentiment analysis.
CLASSIFICATION
LOSS FUNCTIONS
• Measures how good or bad model predictions are compared to actual
results
• Outputs a single number showing error magnitude — smaller is better
• Used to guide model training (e.g., via Gradient Descent)
• Helps evaluate model performance and influences learning behavior
Why are Loss Functions
Important?
• Guide the optimization of model parameters
• Measure difference between predicted and true values
• Different loss functions suit different tasks and affect model learning
REGRESSION LOSS FUNCTIONS
Used for predicting continuous values (e.g., price,
age)
MEAN SQUARED ERROR (MSE) LOSS MEAN ABSOLUTE ERROR (MAE)
LOSS
HUBER
LOSS
• Average of squared
differences between
predicted and actual values
• Sensitive to outliers
• Average of absolute
differences between
predicted and actual values
• Less sensitive to outliers but
not differentiable at zero
• Combines MSE and MAE
benefits
• Less sensitive to outliers and
differentiable everywhere
• Requires tuning of parameter δ
CLASSIFICATION LOSS FUNCTIONS
Used for evaluating how well predicted class labels match actual
labels
BINARY CROSS-ENTROPY LOSS
(LOG LOSS)
CATEGORICAL CROSS-
ENTROPY LOSS
SPARSE CATEGORICAL CROSS-
ENTROPY LOSS
• For binary classification (0 or
1)
• Measures difference between
predicted probabilities and
actual labels
• For multi-class
classification with one-hot
encoded labels
Similar to Categorical Cross-
Entropy but uses integer
labels (not one-hot)
OPTIMIZATIO
N
TECHNIQUES
The optimizer’s role is to find the best combination of
weights and biases that leads to the most accurate
predictions.
GRADIENT DESCENT
Gradient Descent is an optimization
algorithm is used to update the parameters
of the ANN during training.
• Cost function and Loss function are
synonymous and used interchangeably.
Conceptually they are slightly different.
• Loss/Error is what we compute for a single
training example/input sample.
• A cost function, on the other hand, is the
average loss/error over the entire training
dataset (batch/mini-batch).
• The optimization algorithms aim at
minimizing the cost function".
CONTOUR PLOT OF THE LOSS/ERROR SURFACE
AVOIDING LOCAL MINIMA!!!
If the learning rate is too large,
we can overshoot
If the learning rate is too
small,
convergence is very slow
RANDOM WEIGHT
INITIALIZATION
• Random weight initialization ensures diverse
starting points on the loss surface.
• Training is empirical — multiple runs help reach
better local minima.
• Loss functions are often non-convex multiple
→
local minima.
• In high-dimensional spaces, gradients rarely fall
to exact zero we approximate minima.
→
• Local optima often yield good results; searching
for global minima is unnecessary and
computationally expensive.
OPTIMIZE THE LOSS AND UPDATE THE MODEL PARAMETERS
GRADIENT DESCENT AND ITS VARIANTS
GRADIENT DESCENT
Batch Gradient
descent
Stochastic Gradient
descent
Mini-batch
Gradient descent
Entire dataset for updation Single observation for
updation
Subset of data for updation
BATCH, SDG & MINI-BATCH GD
• Batch Gradient Descent uses the entire dataset to
compute the gradient and update parameters,
providing a smooth but sometimes slow convergence
path.
• Stochastic Gradient Descent (SGD) updates
parameters using one data point at a time, resulting in
a noisy but faster and more frequent update, which
helps escape local minima.
• Mini-Batch Gradient Descent strikes a balance by
using small batches of data for each update, combining
efficiency and stable convergence.
BATCH, SDG & MINI-BATCH GD
• Gradient descent involves computing the average over all
n examples, which can be time-consuming when the
training set grows.
• Mini-batch Gradient Descent avoids this by sampling a
mini-batch M of n′ examples from the training set.
• The gradient estimate is computed from all pairs of
examples in the mini-batch M, not from the training set.
• This algorithm is known as Stochastic Gradient Descent
(SGD) when n' contains just one training example.
BATCH, SDG & MINI-BATCH GD
If mini-batch size = 1: Stocastic Gradient Descent
= every example (row) it is used as mini-batch:
lose sped-up from vectorization.
If mini-batch size = m: Batch Gradient
Descent: (xu), YU}) = (X, Y) = the entire
training set is used:
size = n: too long interations.
ADAPTIVE MOMENTUM (ADAM)OPTIMIZER
Optimization algorithm that combines Momentum and RMSProp to adaptively adjust
learning rates during training. It is efficient, requires minimal tuning, and performs
well on large and complex datasets.
EFFECT OF THE LR ON TRAINING
• The learning rate (LR) controls how much the
model's weights are updated during training.
• A high LR may cause the model to overshoot the
minimum, while a low LR can lead to slow
convergence or getting stuck in local minima.
PROJECT DEMO
This project predicts whether a user will generate revenue based on behavioral and technical
features using an Artificial Neural Network (ANN). The model is trained on a customer behavior
dataset.
Customer Purchase
Prediction
Datase
t:
Dataset Source: UCI Online Shoppers Purchasing Intention
Dataset
Target Variable: Revenue (Yes / No)
Total Samples: ~12,330 records
Classes: 0 No Revenue , 1 Revenue Generated
→ →
Train-Test Split: Training Samples: 80%, Testing Samples: 20%
Note: Dropped less relevant features, encoded categorical data, and applied StandardScaler for
normalization.
PROJECT DEMO
Model Flow
Data
Loading
Load dataset and split features (X) and
target (y).
Dropping Unnecessary
Features
Drop less important features to simplify the
model.
Label
Encoding
Convert categorical columns to numeric
form.
One-Hot
Encoding
Encode multiple categories without order using One-
Hot.
Data
Splitting
Split data into training and test
sets.
Feature Scaling
Normalize features to improve model
performance.
Model Architecture
(ANN)
Build a multi-layer ANN with dropout to avoid
overfitting.
Model Architecture (ANN)
output
The model has 5 dense layers with
ReLU activation and dropout layers
to reduce overfitting. The final dense
layer uses sigmoid for binary
classification. Total trainable
parameters: 86,785.
Model Compilation & Training
Compile and train the ANN using Adam
optimizer.
Prediction & Evaluation
Model achieved 83.45% accuracy, evaluated using a confusion matrix on test
data.
WHAT IS CNN?
Convolutional Neural Networks (CNNs)
are deep learning models designed to recognize patterns in images. They
use spec
ial layers to automatically detect features like edges and shapes, making th
em very effective for tasks like image classification and object detection.
MOTIVATION BEHIND USING
CNN
1. Convolution leverages three important ideas to improve ML systems:
i) Sparse interactions
ii) Parameter sharing
iii) Equivariant representations
2. Convolution also allows for working with inputs of variable size
SPARSE INTERACTION
Sparse interaction means each neuron
in a layer connects only to a small
region of the previous layer (called the
local receptive field), instead of all
neurons. This helps CNNs focus on
important local features in images
without using too many connections.
Sparse
Connectivity
Dense
Connectivity
SPARSE INTERACTION
• Sparse Connectivity Local receptive fields
→
Each neuron in layer s connects only to a few neighboring inputs (x).
Example: s3 connects only to x2, x3, and x4, not all inputs. This is
the local receptive field idea.
• Local Neighbourhood Processing
Because each neuron in s sees only a small region of the input
space, it’s only looking at nearby pixels/features.
• Efficient Learning Reduced Input Size
→
Sparse connections mean fewer computations and less memory,
which helps models learn faster and scale better.
Key
Features:
SPARSE INTERACTION
This diagram shows sparse connectivity across
multiple layers, which is common in Convolutional
Neural Networks (CNNs). Each neuron is connected
only to a small, localized group of neurons from the
previous layer, forming local receptive fields. As we
move deeper through the layers (x h g), the
→ →
receptive field increases, allowing the network to
understand larger and more complex patterns while
keeping the number of connections and
computations efficient. This layered structure
enables CNNs to extract both local and global
features effectively.
PARAMETER SHARING
• Same parameters (weights) are reused across different
parts of the input.
• Common in Convolutional Neural Networks (CNNs).
• A kernel/filter slides across the image and applies the
same weights.
• Helps in detecting the same feature (like edges or eyes)
anywhere in the image.
• Reduces memory usage and improves efficiency.
• Less parameters to learn faster training and better
→
generalization.
The same filter detects both eyes — even though
they’re in different places
PARAMETER SHARING
• In CNNs, the same filter (kernel) is applied to every position of the input
image.
• This means each weight in the filter is reused across the entire input.
Suppose:
Input size = m × n
Kernel size = k × k
In a fully connected layer, parameters = m × n
In a convolutional layer, parameters = k × k (independent of
input size)
So, parameter count is reduced from m × n k × k
→
• Runtime of forward propagation remains O(k × n) (efficient)
• But storage requirement drops to just k × k parameters (much
smaller)
PARAMETER SHARING
The figure shows, the black arrows show how a single
weight from a 3-element kernel is applied across
different locations in the input. In contrast, the
bottom part shows a fully connected model where
each weight is used only once for one connection.
Since fully connected layers do not share parameters,
they need many more weights compared to
convolutional layers.
EDGE DETECTION
• Edge detection is used to highlight the boundaries in an image — like where one object ends
and another begins.
• It is done using special filters (kernels) that detect changes in pixel intensity.
• Two common filters are:
Mx - for detecting vertical edges
My - for detecting horizontal edges
• CNNs have revolutionized edge detection by learning hierarchical features directly from data.
• Traditional methods like Sobel and Canny rely on handcrafted filters, while CNN-based
methods learn optimal filters during training.
EDGE DETECTION
• These filters are applied (convolved)
with the input image.
• The result is two output images
showing detected edges in
horizontal and vertical directions.
• These are useful in tasks like object
detection, segmentation, and
feature extraction in CNNs.
EDGE DETECTION
In Convolutional Neural Networks
(ConvNets), edge detection identifies
patterns by matching small image sections
with a filter. Blue and green boxes mark
where the filter aligns with the image,
spotting edges between black and white
squares. This helps the network extract key
features for tasks like image classification.
SOBEL FILTERS
Sobel filters are special 3x3 kernels used in image
processing to detect edges in an image. They work by
emphasizing differences in pixel values, which helps
identify boundaries between objects (edges). There
are two main types:
1.Horizontal Sobel Filter: Detects horizontal edges
by looking for changes in pixel intensity along the
vertical direction.
2. Vertical Sobel Filter: Detects vertical edges by
looking for changes in pixel intensity along the
horizontal direction.
EQUIVARIANCE OF CONVOLUTION TO TRANSLATION
Translational Equivariance or just
equivariance is a very important
property of the convolutional
neural networks where the
position of the object in the image
should not be fixed for it to be
detected by the CNN.
EQUIVARIANCE OF CONVOLUTION TO TRANSLATION
• Equivariant means that if the input changes, the output changes in the same way.
• A function f(x) is equivariant to a function g if f(g(x))=g(f (x))
• If g is a function that translates the input, i.e., that shifts it, then the convolution
function is equivariant to g. I(x,y) is image brightness at point (x,y)
• I’=g(I) is image function with I’(x,y)=I(x-1,y), i.e., shifts every pixel of I one unit to
the right
• If we apply g to I and then apply convolution, the output will be the same as if we
applied convolution to I’, then applied transformation g to the output.
EQUIVARIANCE OF CONVOLUTION TO TRANSLATION
• Equivariant means that if the input changes, the output changes in the same way.
• A function f(x) is equivariant to a function g if f(g(x))=g(f (x))
• If g is a function that translates the input, i.e., that shifts it, then the convolution
function is equivariant to g. I(x,y) is image brightness at point (x,y)
• I’=g(I) is image function with I’(x,y)=I(x-1,y), i.e., shifts every pixel of I one unit to
the right
• If we apply g to I and then apply convolution, the output will be the same as if we
applied convolution to I’, then applied transformation g to the output.
CONVOLUTION: THE MATH BEHIND THE MATCH
• Line up the feature and the image patch.
• Multiply each image pixel by the corresponding feature/filter pixel
(element-wise product).
• Add them up.
• Divide by the total number of pixels in the feature.
FILTERING: THE MATH BEHIND THE
MATCH
A kernel (small grid) matches with a part of the
image by multiplying their values. The green 2x2
kernel (1, 1, -1, 1) overlaps with a yellow 2x2 image
patch (1, 1, 1, 1). Multiply each pair: 1×1 + 1×1 + (-1)×1
+ 1×1 = 1 + 1 - 1 + 1 = 2. This result (2) shows how well
the kernel matches the image patch, helping detect
features like edges.
FILTERING: THE MATH BEHIND THE
MATCH
After multiplying and adding the kernel
(1, 1, -1, 1) with the image patch (1, 1, 1,
1), the result is 2. This value goes into a
new grid called a feature map, shown in
the blue box (top-left corner). As the
kernel slides over the entire image, it
fills the feature map with values,
highlighting where patterns like edges
are found.
FILTERING: THE MATH BEHIND THE
MATCH
The kernel (1, 1, -1, 1) slides to the next
image patch (1, -1, 1, 1). Multiply and
add: 1×1 + 1×(-1) + (-1)×1 + 1×1 = 1 - 1 - 1
+ 1 = 0. This result (0) is placed in the
feature map, next to the previous value
(1). The kernel keeps sliding to fill the
feature map with values that show
where patterns match.
FILTERING: THE MATH BEHIND THE
MATCH
The kernel (1, 1, -1, 1) moves to the next
image patch (1, -1, 1, 1). Multiply and
add: 1×1 + 1×(-1) + (-1)×1 + 1×1 = 1 - 1 - 1
+ 1 = 0. This result (0) is added to the
feature map, following the previous
values (1, 1). The kernel keeps sliding
across the image to build the feature
map, showing where patterns match.
FILTERING: THE MATH BEHIND THE
MATCH
The kernel (1, 1, -1, 1) slides to the next
image patch (1, 1, 1, 1). Multiply and
add: 1×1 + 1×1 + (-1)×1 + 1×1 = 1 + 1 - 1 +
1 = 2. This result (2) is added to the
feature map, following the previous
values (1, 1, 1). The kernel keeps moving
across the image to complete the
feature map, revealing pattern matches.
FILTERING: THE MATH BEHIND THE
MATCH
The kernel (1, 1, -1, 1) slides to the next
image patch (1, 1, 1, 1). Multiply and
add: 1×1 + 1×1 + (-1)×1 + 1×1 = 1 + 1 - 1 +
1 = 2. This result (2) is added to the
feature map, following the previous
values (1, 1, 1, 1). The kernel keeps
sliding to fill the feature map, showing
where patterns match.
FILTERING: THE MATH BEHIND THE
MATCH
The kernel (1, 1, -1, 1) slides to the next
image patch (1, 1, 1, 1). Multiply and
add: 1×1 + 1×1 + (-1)×1 + 1×1 = 1 + 1 - 1 +
1 = 2. This result (2) is added to the
feature map, following the previous
values (1, 1, 1, 1, 1). The kernel keeps
sliding to fill the feature map, showing
where patterns match.
FILTERING: THE MATH BEHIND THE
MATCH
A 3x3 kernel (1, -1, -1, 1, 1, -1, 1, -1, 1)
overlaps with a 3x3 image patch (1, 1,
1, 1, 1, 1, 1, 1, 1). Multiply each pair
and add: 1×1 + (-1)×1 + (-1)×1 + 1×1 +
1×1 + (-1)×1 + 1×1 + (-1)×1 + 1×1 = 1 - 1
- 1 + 1 + 1 - 1 + 1 - 1 + 1 = 1. Divide by 9
(total pixels in the kernel) to get 1. This
value (1) goes into the feature map.
CONVOLUTION: APPLYING A SHARPEN
FILTER
A 3x3 sharpen filter (-1, -1, -1, -1, 8, -1, -1, -1, -1)
slides over a 5x5 image. For the green patch
(21, 19, 17, 71, 76, 73, 153, 164, 164), multiply
and add: (-1)×21 + (-1)×19 + (-1)×17 + (-1)×71 +
8×76 + (-1)×73 + (-1)×153 + (-1)×164 + (-1)×164 = -
21 - 19 - 17 - 71 + 608 - 73 - 153 - 164 - 164 = -74.
This result (-74) goes into the feature map,
highlighting edges in the image.
FILTERS AND CONVOLUTIONAL FEATURE
MAP
• Filters (or kernels) are small grids of numbers used
to find patterns in an image, like edges or textures.
They slide over the image in a process called
convolution, calculating new values at each step.
This creates a feature map that highlights the
patterns found.
• In this example, two Sobel filters are used: one
detects horizontal edges, and the other detects
vertical edges. These feature maps help in
understanding important parts of the image and are
widely used in computer vision and deep learning.
APPLYING A BANK OF FILTERS
A filter bank contains multiple filters that
detect different patterns in an image.
When applied through convolution, they
create feature maps that highlight edges,
textures, and shapes. These filters are
learned automatically during training to
help the model understand the image.
CONVOLUTION MULTIPLE
CHANNELS
convolution works with multiple input channels, such as a color image with Red,
Green, and Blue (RGB) layers. Each filter is also made up of multiple layers, one for
each channel. The filter is applied across all channels of the input, and the results are
summed to produce a single value in the output feature map. This process is
repeated for each filter to create multiple output channels. This technique helps
capture more complex features by combining information from all color channels.
CONVOLUTION MULTIPLE
CHANNELS
This example shows convolution with multiple channels using a 3D
input volume of size 7×7×3 (height × width × channels).
Each filter (W0 and W1) is also 3D (3×3×3) to match the input depth.
1.The input has 3 channels (like an RGB image), shown in blue boxes.
2. Filter W0 (in red) is applied across all 3 input channels. For each
position:
• Each 3×3 slice from the input is multiplied with the matching filter
slice.
• The results from all 3 slices are summed.
• A bias (b0) is added to produce a single number.
3.This process is repeated across the image to produce one output
channel (green numbers).
4.Similarly, Filter W1 produces a second output channel using the
same steps.
5.The final output volume is 3×3×2, where 2 is the number of filters
used.
CNN APPLYING AN ACTIVATION FUNCTION
For a neuron at position (p,q) in the hidden
layer:
Output(p,q)
=
• x(i+p)(j+q)​
: input values from a 4×4
patch
• wij​
: weights from the
filter
• b: bias
term
• f: activation function (e.g.,
ReLU)
This equation describes:
1.Taking a weighted sum of the input patch
2. Adding a bias
3. Passing the result through an activation
function
CONVOLUTION LAYER
This is how a convolution layer works with a 3D
input. The input image is of size 32×32×3 (e.g.,
RGB), and the filter size is 5×5×3. The filter always
matches the depth of the input. It slides over the
image spatially (width and height), computing dot
products at each position. This operation helps
extract meaningful features like edges and
textures from the input.
CONVOLUTION LAYER
A 5×5×3 filter is applied to a 32×32×3 input image
by taking a small 5×5×3 patch (75 values) and
computing a dot product with the filter weights.
After adding a bias, this gives one output value.
This operation is repeated across the entire
image.
Mathematically, it's represented as:
CONVOLUTION LAYER
A 5×5×3 filter is applied to a 32×32×3 input image
by taking a small 5×5×3 patch (75 values) and
computing a dot product with the filter weights.
After adding a bias, this gives one output value.
This operation is repeated across the entire
image.
Mathematically, it's represented as:
• w^T: Transpose of the weight
vector.
• x: Input vector.
• b: Bias term
CONVOLUTION LAYER
A 5×5×3 filter is slid over a 32×32×3 input volume
to compute dot products at each spatial location.
Each computation produces a single number, and
this operation is repeated across the entire input.
The result is a 28×28×1 activation map, where the
reduced spatial dimensions are due to the filter
size and no padding being applied. This activation
map captures local patterns from the input using
the learned filter.
CONVOLUTION LAYER
By using a second 5×5×3 filter, an additional
28×28 activation map is generated. Each filter
captures different features from the same input
volume. Stacking the results from both filters
creates multiple activation maps, allowing
deeper representation of the input. This helps
the network learn various patterns like edges,
textures, or colors.
CONVOLUTION LAYER
Using 6 filters of size 5×5, each spanning the
full depth of the input (3 channels), produces 6
distinct 28×28 activation maps. These maps are
stacked depth-wise to form a new volume of
size 28×28×6, which serves as the output of the
convolution layer. Each filter learns to detect
different features from the same input.
POOLING
LAYER
Pooling layers in CNNs shrink the feature map’s width and height while keeping important features. A
small window slides over the feature map, summarizing each region
For a feature map with dimensions ​ the dimensions of the output after a pooling
layer are:
Types of Pooling
Layers:
• Max Pooling
• Average Pooling
SPATIAL DIMENSION - STRIDES
Spatial Dimensional Strides" refers to how the filter moves across the input during convolution. The
stride is the number of pixels the filter shifts at each step. It directly affects the size of the output
(feature map) — larger strides result in smaller outputs.
SPATIAL DIMENSION - STRIDES
A 5×5×3 filter slides over a 32×32×3 input with a stride of 1. At each location, it computes a dot product,
resulting in one value in the output. This process continues across all spatial positions, generating a
28×28 activation map. The reduction in size is due to the filter not being applied on the border pixels,
which reduces the width and height by 4 (32 - 5 + 1 = 28).
MAX POOLING: SHRINKING THE
DIMENSION
• Max Pooling shrinks a feature map by sliding a
small window over it and taking the largest value
in each region. It reduces the size, keeps
important features, and makes the CNN faster
and more efficient.
• A 2x2 window slides over a 4x4 feature map with
a stride of 2. In each region, the largest value is
taken: [1, 1, 5, 6] becomes 6, [2, 4, 7, 8] becomes
8, [3, 2, 1, 2] becomes 3, and [1, 0, 3, 4] becomes
4. The result is a smaller 2x2 feature map [6, 8, 3,
4]. This reduces the size, keeps key features, and
makes the CNN more efficient.
AVERAGE POOLING: SHRINKING THE
DIMENSION
• Average Pooling shrinks a feature map by sliding
a small window over it and taking the average
value of each region, reducing size while
summarizing features.
• A 2x2 window slides over a 4x4 feature map with
a stride of 2. In each region, the average value is
taken: [4, 3, 1, 3] becomes 2.8, [1, 5, 4, 8]
becomes 4.5, [4, 5, 6, 5] becomes 5.0, and [4, 3, 9,
4] becomes 5.0. The result is a smaller 2x2
feature map [2.8, 4.5, 5.0, 5.0]. This reduces the
size, summarizes features, and makes the CNN
more efficient.
CONVOLUTION – REDUCTION IN FEATURE
DIMENSION
A 3x3 filter slides over a 4x4 input with padding
to keep the output size the same. The input
(4x4) is padded with zeros around the edges,
making it effectively larger, and the filter
processes each region to produce a 4x4 output.
This maintains the size but extracts features,
preparing the data for further steps like pooling.
‘VALID’ PADDING – REDUCTION IN FEATURE
DIMENSION
• Valid padding in convolution, focusing on
how it reduces the output size compared to
the input. It fits well with your previous
convolution slides, showing a different
padding approach.
• A 2x2 filter slides over a 4x4 input with valid
padding, meaning no extra zeros are added
around the edges. With a stride of 1, the
filter produces a 3x3 output: [1.25, 0.5, 0.5,
0.5, 0.75, 1.5, 0.25, 1.25, 1]. This reduces the
size of the feature map while extracting key
features for the CNN.
‘SAME’ PADDING – NO REDUCTION IN FEATURE
DIMENSION
• Same padding in convolution, focusing on
how it maintains the output size. It fits well
with your previous convolution slides,
showing a different padding approach.
• A 2x2 filter slides over a 4x4 input with same
padding, adding zeros around the edges to
keep the output size 4x4. With a stride of 1,
the filter produces a 4x4 output: [0.5, 0, 0.25,
0.25, 0, 1.25, 0.5, 0.5, 0, 0.5, 0.75, 1.5, 0.5,
0.25, 1.25, 1]. This preserves the size while
extracting features for the CNN.
TYPICAL PROCESSING BLOCKS IN A CNN
A CNN takes input feature maps, applies filters in the convolution layer to create convolution
feature maps with features like edges, then uses the pooling layer to shrink them into smaller
pooling feature maps. These layers work together to process images efficiently.
CNN ARCHITECTURE
CNN ARCHITECTURE
A Convolutional Neural Network (CNN) is made up of layers that automatically learn
features from images. It usually starts with convolution layers that apply filters to
detect patterns like edges or textures. Then, pooling layers reduce the size of the
data while keeping important information. After several layers of convolution and
pooling, the output is flattened into a vector and passed through fully connected
layers, which make the final prediction. CNNs are widely used in image classification
and other computer vision tasks because they can learn to focus on important parts
of the image.
FULLY-CONNECTED LAYERS
Fully-connected layers in a CNN are similar to those
in a traditional neural network (MLP). They connect
every neuron in one layer to every neuron in the next.
After convolution and pooling layers extract features,
the fully-connected layer takes these features,
flattens them into a 1D vector, and uses them for final
classification. This part of the network learns patterns
and makes predictions like digit labels or object
categories.
TRAINING IN
CNN
CLASSIFICATION WITH FC LAYERS
After the convolution and pooling layers we need to add a Fully Connected
Layers at the end to enable the ANN learn complex patterns from the
feature maps generated by the previous Convolutional Layers.
• CNN layers act as feature extractor.
• FC layer act as feature classifier.
CONVNET (IMAGE CLASSIFICATION) = CNN FEATURE
EXTRACTOR + FC LAYER FEATURE CLASSIFIER
DROPOUT LAYER
A Dropout layer helps prevent overfitting by
randomly turning off some neurons during
training. This forces the model to learn
more robust features, improving its ability
to generalize well on new, unseen data.
PROJECT DEMO
This project classifies images into predefined categories using a Convolutional Neural
Network (CNN) trained on the Vehicle Image Classification Dataset.
Vehicles Muticlass
Classification
Datase
t:
The Dataset consists of 7 classes Auto Rickshaws , Bikes, Cars, Motorcycles, Planes, Ships,
Trains
Total Images taken - 5590
Splitted into Training - 3906, Testing - 839 and Validation - 845
Note: Missing images may result from file corruption, naming conflicts, path errors, split
issues, or preprocessing losses.
PROJECT DEMO
Model
Flow
Train/Validation/Test Split - Sample
Code
Images are split into train, validate, and test sets using train_test_split()
with fixed random_state. Each image is copied into class-wise folders
using shutil.copy2(), preserving structure for model training.
Pre-processing - Sample
Code
ImageDataGenerator loads images from specified directories (flow_from_directory),
applies rescale and target_size, and categorizes them into 7 classes as per class_mode.
CNN - Sample
Code
Conv2D layers extract features, MaxPooling2D downsizes, Flatten prepares for Dense layers, Dropout
reduces overfitting, and softmax outputs num_classes probabilities; Adam optimizes with
categorical_crossentropy for multi-class classification.
CNN - Sample Code
Output
The Model: "sequential" has layers: conv2d (32, 64, 128 filters), max_pooling2d, flatten,
dense (128, 7 units), and dropout; total params: 4,829,319 (trainable params:
4,829,319).
Training the Model with Data Generators - Sample
Code
model.fit() runs training on train_generator data for 20 epochs, evaluating
performance on validation_data from validate_generator.
Model Evaluation - Sample
Code
model.evaluate() assesses the model on test_generator, computing loss with
categorical_crossentropy and accuracy as metrics, processing all batches (35/35).
Plotting Training and Validation Metrics - Sample
Code
The code uses plt.figure(figsize=(12,
4)), plt.subplot(), and plt.plot() to
graph history.history['accuracy']
(Train Acc),
history.history['val_accuracy'] (Val
Acc), history.history['loss'] (Train
Loss), and history.history['val_loss']
(Val Loss) with plt.legend() and
plt.title().
Plotting Training and Validation Metrics - Sample Code
output
The Accuracy plot shows Train Acc rising to ~0.9, Val Acc fluctuating around 0.7-0.8, indicating
overfitting; the Loss plot shows Train Loss dropping to ~0.2, Val Loss decreasing but unstable at
~0.4, suggesting inconsistent validation performance.
Confusion Matrix - Sample
Code
confusion_matrix compares
true_classes and predicted_classes,
sns.heatmap visualizes errors with
class_labels on axes, and
classification_report provides
precision, recall, and F1-score per
class.
Confusion Matrix - Sample Code
Output
The Confusion Matrix displays
prediction accuracy for 7
classes; high diagonal values
(e.g., 142 Auto Rickshaws, 159
Bikes) show correct predictions,
but errors occur, like 26 Planes
and 22 Trains misclassified as
Ships.
Confusion Report - Sample Code
Output
The Classification Report shows Bikes
(0.97) and Motorcycles (0.91) with
highest f1-scores, Planes (0.81) and
Trains (0.83) lowest; overall accuracy is
0.87 for 1,116 samples.
Testing - Sample
Code
It processes a single image for
classification, rescaling and predicting
its class as "Bikes" using the trained
model. Predicted Correctly
THANK YOU

More Related Content

PDF
Deep learning
PPTX
Multilayer Perceptron Neural Network MLP
PPT
UNIT 5-ANN.ppt
PPTX
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
PPTX
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
PDF
Nural Network ppt presentation which help about nural
PPTX
Unit 2 ml.pptx
PPTX
4.2 Neural Networks Overviewwwwwwww.pptx
Deep learning
Multilayer Perceptron Neural Network MLP
UNIT 5-ANN.ppt
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
Nural Network ppt presentation which help about nural
Unit 2 ml.pptx
4.2 Neural Networks Overviewwwwwwww.pptx

Similar to Artificial intelligence learning presentations (20)

PPTX
Convolutional neural networks
PPTX
Deep learning Ann(Artificial neural network)
PPTX
Introduction to Neural Networks By Simon Haykins
PPTX
Artificial Neural Networks presentations
PDF
Artificial Neural Network
PDF
Artificial Intelligence and machine learning 2
PPTX
Introduction to Neural Netwoks
PPTX
Introduction to Neural Networks and its application
PPTX
14_cnn complete.pptx
PPTX
Artificial neural network by arpit_sharma
PDF
Artificial Neural Networks: Introduction, Neural Network representation, Appr...
PPTX
Neural Networks and its related Concepts
PPTX
Deep neural networks & computational graphs
PPTX
simple NN and RBM arch for slideshare.pptx
PPTX
Perceptron for neuron (Single Neuron).pptx
PPTX
Chapter-5-Part I-Basics-Neural-Networks.pptx
PPT
deep learning UNIT-1 Introduction Part-1.ppt
PPTX
Backpropagation and computational graph.pptx
PPTX
03 Single layer Perception Classifier
PPTX
Introduction to Perceptron and Neural Network.pptx
Convolutional neural networks
Deep learning Ann(Artificial neural network)
Introduction to Neural Networks By Simon Haykins
Artificial Neural Networks presentations
Artificial Neural Network
Artificial Intelligence and machine learning 2
Introduction to Neural Netwoks
Introduction to Neural Networks and its application
14_cnn complete.pptx
Artificial neural network by arpit_sharma
Artificial Neural Networks: Introduction, Neural Network representation, Appr...
Neural Networks and its related Concepts
Deep neural networks & computational graphs
simple NN and RBM arch for slideshare.pptx
Perceptron for neuron (Single Neuron).pptx
Chapter-5-Part I-Basics-Neural-Networks.pptx
deep learning UNIT-1 Introduction Part-1.ppt
Backpropagation and computational graph.pptx
03 Single layer Perception Classifier
Introduction to Perceptron and Neural Network.pptx
Ad

Recently uploaded (20)

PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PDF
IP : I ; Unit I : Preformulation Studies
PPTX
Module on health assessment of CHN. pptx
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
Farming Based Livelihood Systems English Notes
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
M.Tech in Aerospace Engineering | BIT Mesra
PDF
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
PDF
Race Reva University – Shaping Future Leaders in Artificial Intelligence
PDF
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
PDF
Journal of Dental Science - UDMY (2021).pdf
PDF
HVAC Specification 2024 according to central public works department
PDF
Everyday Spelling and Grammar by Kathi Wyldeck
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PPTX
Core Concepts of Personalized Learning and Virtual Learning Environments
PDF
Literature_Review_methods_ BRACU_MKT426 course material
PDF
Empowerment Technology for Senior High School Guide
PDF
CRP102_SAGALASSOS_Final_Projects_2025.pdf
PDF
English Textual Question & Ans (12th Class).pdf
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
IP : I ; Unit I : Preformulation Studies
Module on health assessment of CHN. pptx
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
Farming Based Livelihood Systems English Notes
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
M.Tech in Aerospace Engineering | BIT Mesra
Skin Care and Cosmetic Ingredients Dictionary ( PDFDrive ).pdf
Race Reva University – Shaping Future Leaders in Artificial Intelligence
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
Journal of Dental Science - UDMY (2021).pdf
HVAC Specification 2024 according to central public works department
Everyday Spelling and Grammar by Kathi Wyldeck
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
Core Concepts of Personalized Learning and Virtual Learning Environments
Literature_Review_methods_ BRACU_MKT426 course material
Empowerment Technology for Senior High School Guide
CRP102_SAGALASSOS_Final_Projects_2025.pdf
English Textual Question & Ans (12th Class).pdf
AI-driven educational solutions for real-life interventions in the Philippine...
Ad

Artificial intelligence learning presentations

  • 2. ANN - IMITATE HUMAN BRAIN!!! • ANN’s are computational models inspired by the human brain's structure and function. • ANN acts as a brain. It has various layers which are interconnected to each other such as the input layer and the hidden layer
  • 3. BIOLOGICAL NEURON • Biological Neurons are of many different types • Dendrites receive signals from adjacent neurons, the cell body processes the input and • Generates activations that are passed through the axon to the other neurons. • Cell can perform complex non-linear computations • Synapses are not a single weight but a complex non-linear dynamical system
  • 4. PERCEPTRON • A perceptron is the simplest form of an artificial neuron and the fundamental building block of neural networks. • Introduced by Frank Rosenblatt in 1958, it’s a mathematical model inspired by biological neurons. • It’s a binary classifier that makes a decision by mapping input features to an output (0 or 1).
  • 5. • Input Features: Multiple features representing input data characteristics. • Weights: Each feature is assigned a weight determining its influence on output. • Summation Function: Calculates weighted sum of inputs, combining them with respective weights. • Activation Function: Passes weighted sum through Heaviside step function to produce binary output (0 or 1). Basic Components of Perceptron
  • 6. • Output: Determined by activation function, often used for binary classification tasks. • Bias: Helps make adjustments independent of input, improving learning flexibility. • Learning Algorithm: Adjusts weights and bias using a learning algorithm like Perceptron Learning Rule. Basic Components of Perceptron
  • 7. • Artificial Neuron connected with an input vector x=[x1 x2 x3 x4] • Input is multiplied with a weight vector w=[w1 w2 w3 w4] • The weighted input is summed up, weighted sum = x1* w1 + x2* w2 + x3* w3 + x4* w4 • A bias term 1 is added to form a net sum. Net sum = x1* w1 + x2* w2 + x3* w3 + x4* w4+ 1 •Net sum is finally passed through an activation function to produce the final output neuron How does a Perceptron Work?
  • 8. UNIT - STEP FUNCTION A simple activation function to produce a binary output from a given input n. We can define it as follows, w.r.t a chosen threshold t f(n) = 0 for n < t = 1, otherwise
  • 9. Step 1: Compute the weighted sum: HOW DOES A PERCEPTRON WORK? Step 2: Apply activation (step function): Training the Perceptron: • Initialize: Start with weights W = 0 and bias b = 0 • For Each Data Point:
  • 10. ACTIVATION FUNCTIONS Activation functions introduce non-linearity, enabling neural networks to solve complex problems.
  • 11. ACTIVATION FUNCTIONS Sigmoid: sigma(x) = 1/(1+e x) − outputs 0 to 1, used for binary classification (e.g., spam detection). Graph shows a smooth S-curve. Softmax: softmax(xi​ ) = exi​ / (exj​ )​ ​ ∑ for all j) outputs probabilities (summing to 1), used for multi-class classification (e.g., image labeling). Tanh: tanh(x) = 2 / 1 + e—2x outputs -1 to 1, similar to Sigmoid but centered at 0, good for balanced data. Graph shows a steeper S-curve. Leaky ReLU: it is a variant of the ReLU that addresses the dying neurons issue. Leaky ReLU makes provision for non-zero output even for negative values by introducing a small slope. ReLU: f(x) = max (0,x) outputs the input value if it is positive or returns zero. In simple words, ReLU only returns a positive output or zero.
  • 12. MULTILAYER PERCEPTRON • The input X is fed into the first layer, which is a multidimensional perceptron with a weight matrix W1 and bias vector b1. • The output of that layer is then fed into the second layer, which is again a perceptron with another weight matrix W2 and bias vector b2. • This process continues for every of the L layers until we reach the output layer. We refer to the last layer as the output layer and to every other layer as a hidden layer.
  • 13. FORWARD PROPAGATION IN MLP • An MLP with one hidden layers computes the function • an MLP with two hidden layers computes the function • Like this, during forward propagation the activations/output from neurons of one layers are passed to the neurons of the next layers. • and, generally, an MLP with L 1 hidden layers − computes the function
  • 14. BACKWARD PROPOGATION IN MLP • During Backward Propagation (during Training) - error of the loss function is calculated at the output layer. • We need to minimize the amount of error made by the neural network. This is done through an iterative process known as training. • The error of the loss function can be minimized by updating the network parameters (weights and biases) of the connections between neurons. • The gradients of the loss function is calculated w.r.t the weights using Backpropagation algorithm. • Applying Gradient Descent, the parameter update is performed (updating weights and biases). • This steps are repeated up to a fixed number of iterations or until the difference of errors in two successive iterations is less than a predetermined threshold (meeting the
  • 15. EXAMPLE Net Input to the nodes at the hidden layer: • h1= 0.81 * 0.5 + 0.12 * 1 + 0.92 * 0.8 • h2= 0.33 * 0.5 + 0.44 * 1 + 0.72 * 0.8 • h3= 0.29 * 0.5 + 0.22 * 1 + 0.53 * 0.8 • h4= 0.37 * 0.5 + 0.12 * 1 + 0.27 * 0.8
  • 16. COMPUTING NET INPUT TO NEURONS AT HIDDEN LAYER The image depicts a neural network layer where biases b1, b2, b3 are added to hidden nodes h1, h2, h3 alongside weighted inputs from x1, x2, x3. These biases shift the activation function, helping the network better fit the data by adjusting the output of each neuron.
  • 17. COMPUTING NET INPUT TO NEURONS AT HIDDEN LAYER Passes through an activation function to generate the output of the neurons.
  • 18. WEIGHTS BETWEEN INPUT-HIDDEN LAYERS The connection between input and hidden layers in a neural network, where inputs x1, x2, x3 are linked to hidden nodes h1, h2, h3, h4 via weights. The weight matrix defines the strength of these connections, crucial for transforming input data in the network.
  • 19. COMPUTING NET INPUT TO NEURONS AT OUTPUT LAYER The image says the backward propagation in a neural network, where the error between predicted outputs o1, o2 and actual targets is calculated and propagated backward from the output layer to hidden layers and inputs x1, x2, x3. This process adjusts weights and biases to minimize the error, enabling the network to learn effectively.
  • 20. COMPUTING NET INPUT TO NEURONS AT OUTPUT LAYER • The diagram shows an artificial Neuron connected with an input vector x=[x1 x2 x3 x4] • The input is multiplied with a weight vector w=[w1 w2 w3 w4] • The weighted input is summed up, weighted sum = x1* w1 + x2* w2 + x3* w3 + x4* w4 • A bias term 1 is added to form a net sum. Net sum = x1* w1 + x2* w2 + x3* w3 + x4* w4+ 1 • The Net sum is finally passed through an activation function to produce the final output neuron
  • 21. IMPORTANT POINTS TO REMEMBER ABOUT MLP MLPs are connectionist computational models • We can solve classification and regression problems • MLPs can compose Boolean functions • MLPs can compose real-valued functions • MLPs are Universal function approximators (Universal approximation Theorem) • MLPs can represent any function if > it is sufficiently wide (number of neurons in a hidden layer) > it is sufficiently deep (number of hidden layers) > depth can be traded off for (sometimes) exponential growth of the width of the network • Optimal width and depth depend on the number of input variables and the complexity of the function it is trying to model
  • 22. ANN – REGRESSION AND CLASSIFICATION In regression, ANNs aim to predict continuous numerical values, while in classification, they aim to categorize data into discrete classes.
  • 23. • Goal: To predict a continuous numerical output. • Example: Predicting house prices, stock prices, or temperature. • Output Layer: A linear activation function is typically used in the output layer. • Loss Function: Mean Squared Error (MSE) is commonly used as the loss function. • Applications: Predicting quantities, estimating values, and forecasting trends. REGRESSION • Goal: To categorize data into discrete classes or categories. • Example: Classifying emails as spam or not spam, recognizing handwritten digits, or identifying image objects. • Output Layer: A Softmax activation function is used to produce probabilities for each class. • Loss Function: Cross-entropy loss is typically used as the loss function. • Applications: Pattern recognition, object detection, and sentiment analysis. CLASSIFICATION
  • 24. LOSS FUNCTIONS • Measures how good or bad model predictions are compared to actual results • Outputs a single number showing error magnitude — smaller is better • Used to guide model training (e.g., via Gradient Descent) • Helps evaluate model performance and influences learning behavior Why are Loss Functions Important? • Guide the optimization of model parameters • Measure difference between predicted and true values • Different loss functions suit different tasks and affect model learning
  • 25. REGRESSION LOSS FUNCTIONS Used for predicting continuous values (e.g., price, age) MEAN SQUARED ERROR (MSE) LOSS MEAN ABSOLUTE ERROR (MAE) LOSS HUBER LOSS • Average of squared differences between predicted and actual values • Sensitive to outliers • Average of absolute differences between predicted and actual values • Less sensitive to outliers but not differentiable at zero • Combines MSE and MAE benefits • Less sensitive to outliers and differentiable everywhere • Requires tuning of parameter δ
  • 26. CLASSIFICATION LOSS FUNCTIONS Used for evaluating how well predicted class labels match actual labels BINARY CROSS-ENTROPY LOSS (LOG LOSS) CATEGORICAL CROSS- ENTROPY LOSS SPARSE CATEGORICAL CROSS- ENTROPY LOSS • For binary classification (0 or 1) • Measures difference between predicted probabilities and actual labels • For multi-class classification with one-hot encoded labels Similar to Categorical Cross- Entropy but uses integer labels (not one-hot)
  • 27. OPTIMIZATIO N TECHNIQUES The optimizer’s role is to find the best combination of weights and biases that leads to the most accurate predictions.
  • 28. GRADIENT DESCENT Gradient Descent is an optimization algorithm is used to update the parameters of the ANN during training.
  • 29. • Cost function and Loss function are synonymous and used interchangeably. Conceptually they are slightly different. • Loss/Error is what we compute for a single training example/input sample. • A cost function, on the other hand, is the average loss/error over the entire training dataset (batch/mini-batch). • The optimization algorithms aim at minimizing the cost function". CONTOUR PLOT OF THE LOSS/ERROR SURFACE
  • 30. AVOIDING LOCAL MINIMA!!! If the learning rate is too large, we can overshoot If the learning rate is too small, convergence is very slow
  • 31. RANDOM WEIGHT INITIALIZATION • Random weight initialization ensures diverse starting points on the loss surface. • Training is empirical — multiple runs help reach better local minima. • Loss functions are often non-convex multiple → local minima. • In high-dimensional spaces, gradients rarely fall to exact zero we approximate minima. → • Local optima often yield good results; searching for global minima is unnecessary and computationally expensive.
  • 32. OPTIMIZE THE LOSS AND UPDATE THE MODEL PARAMETERS
  • 33. GRADIENT DESCENT AND ITS VARIANTS GRADIENT DESCENT Batch Gradient descent Stochastic Gradient descent Mini-batch Gradient descent Entire dataset for updation Single observation for updation Subset of data for updation
  • 34. BATCH, SDG & MINI-BATCH GD • Batch Gradient Descent uses the entire dataset to compute the gradient and update parameters, providing a smooth but sometimes slow convergence path. • Stochastic Gradient Descent (SGD) updates parameters using one data point at a time, resulting in a noisy but faster and more frequent update, which helps escape local minima. • Mini-Batch Gradient Descent strikes a balance by using small batches of data for each update, combining efficiency and stable convergence.
  • 35. BATCH, SDG & MINI-BATCH GD • Gradient descent involves computing the average over all n examples, which can be time-consuming when the training set grows. • Mini-batch Gradient Descent avoids this by sampling a mini-batch M of n′ examples from the training set. • The gradient estimate is computed from all pairs of examples in the mini-batch M, not from the training set. • This algorithm is known as Stochastic Gradient Descent (SGD) when n' contains just one training example.
  • 36. BATCH, SDG & MINI-BATCH GD If mini-batch size = 1: Stocastic Gradient Descent = every example (row) it is used as mini-batch: lose sped-up from vectorization. If mini-batch size = m: Batch Gradient Descent: (xu), YU}) = (X, Y) = the entire training set is used: size = n: too long interations.
  • 37. ADAPTIVE MOMENTUM (ADAM)OPTIMIZER Optimization algorithm that combines Momentum and RMSProp to adaptively adjust learning rates during training. It is efficient, requires minimal tuning, and performs well on large and complex datasets.
  • 38. EFFECT OF THE LR ON TRAINING • The learning rate (LR) controls how much the model's weights are updated during training. • A high LR may cause the model to overshoot the minimum, while a low LR can lead to slow convergence or getting stuck in local minima.
  • 39. PROJECT DEMO This project predicts whether a user will generate revenue based on behavioral and technical features using an Artificial Neural Network (ANN). The model is trained on a customer behavior dataset. Customer Purchase Prediction Datase t: Dataset Source: UCI Online Shoppers Purchasing Intention Dataset Target Variable: Revenue (Yes / No) Total Samples: ~12,330 records Classes: 0 No Revenue , 1 Revenue Generated → → Train-Test Split: Training Samples: 80%, Testing Samples: 20% Note: Dropped less relevant features, encoded categorical data, and applied StandardScaler for normalization.
  • 41. Data Loading Load dataset and split features (X) and target (y).
  • 42. Dropping Unnecessary Features Drop less important features to simplify the model.
  • 44. One-Hot Encoding Encode multiple categories without order using One- Hot.
  • 45. Data Splitting Split data into training and test sets.
  • 46. Feature Scaling Normalize features to improve model performance.
  • 47. Model Architecture (ANN) Build a multi-layer ANN with dropout to avoid overfitting.
  • 48. Model Architecture (ANN) output The model has 5 dense layers with ReLU activation and dropout layers to reduce overfitting. The final dense layer uses sigmoid for binary classification. Total trainable parameters: 86,785.
  • 49. Model Compilation & Training Compile and train the ANN using Adam optimizer.
  • 50. Prediction & Evaluation Model achieved 83.45% accuracy, evaluated using a confusion matrix on test data.
  • 51. WHAT IS CNN? Convolutional Neural Networks (CNNs) are deep learning models designed to recognize patterns in images. They use spec ial layers to automatically detect features like edges and shapes, making th em very effective for tasks like image classification and object detection.
  • 52. MOTIVATION BEHIND USING CNN 1. Convolution leverages three important ideas to improve ML systems: i) Sparse interactions ii) Parameter sharing iii) Equivariant representations 2. Convolution also allows for working with inputs of variable size
  • 53. SPARSE INTERACTION Sparse interaction means each neuron in a layer connects only to a small region of the previous layer (called the local receptive field), instead of all neurons. This helps CNNs focus on important local features in images without using too many connections. Sparse Connectivity Dense Connectivity
  • 54. SPARSE INTERACTION • Sparse Connectivity Local receptive fields → Each neuron in layer s connects only to a few neighboring inputs (x). Example: s3 connects only to x2, x3, and x4, not all inputs. This is the local receptive field idea. • Local Neighbourhood Processing Because each neuron in s sees only a small region of the input space, it’s only looking at nearby pixels/features. • Efficient Learning Reduced Input Size → Sparse connections mean fewer computations and less memory, which helps models learn faster and scale better. Key Features:
  • 55. SPARSE INTERACTION This diagram shows sparse connectivity across multiple layers, which is common in Convolutional Neural Networks (CNNs). Each neuron is connected only to a small, localized group of neurons from the previous layer, forming local receptive fields. As we move deeper through the layers (x h g), the → → receptive field increases, allowing the network to understand larger and more complex patterns while keeping the number of connections and computations efficient. This layered structure enables CNNs to extract both local and global features effectively.
  • 56. PARAMETER SHARING • Same parameters (weights) are reused across different parts of the input. • Common in Convolutional Neural Networks (CNNs). • A kernel/filter slides across the image and applies the same weights. • Helps in detecting the same feature (like edges or eyes) anywhere in the image. • Reduces memory usage and improves efficiency. • Less parameters to learn faster training and better → generalization. The same filter detects both eyes — even though they’re in different places
  • 57. PARAMETER SHARING • In CNNs, the same filter (kernel) is applied to every position of the input image. • This means each weight in the filter is reused across the entire input. Suppose: Input size = m × n Kernel size = k × k In a fully connected layer, parameters = m × n In a convolutional layer, parameters = k × k (independent of input size) So, parameter count is reduced from m × n k × k → • Runtime of forward propagation remains O(k × n) (efficient) • But storage requirement drops to just k × k parameters (much smaller)
  • 58. PARAMETER SHARING The figure shows, the black arrows show how a single weight from a 3-element kernel is applied across different locations in the input. In contrast, the bottom part shows a fully connected model where each weight is used only once for one connection. Since fully connected layers do not share parameters, they need many more weights compared to convolutional layers.
  • 59. EDGE DETECTION • Edge detection is used to highlight the boundaries in an image — like where one object ends and another begins. • It is done using special filters (kernels) that detect changes in pixel intensity. • Two common filters are: Mx - for detecting vertical edges My - for detecting horizontal edges • CNNs have revolutionized edge detection by learning hierarchical features directly from data. • Traditional methods like Sobel and Canny rely on handcrafted filters, while CNN-based methods learn optimal filters during training.
  • 60. EDGE DETECTION • These filters are applied (convolved) with the input image. • The result is two output images showing detected edges in horizontal and vertical directions. • These are useful in tasks like object detection, segmentation, and feature extraction in CNNs.
  • 61. EDGE DETECTION In Convolutional Neural Networks (ConvNets), edge detection identifies patterns by matching small image sections with a filter. Blue and green boxes mark where the filter aligns with the image, spotting edges between black and white squares. This helps the network extract key features for tasks like image classification.
  • 62. SOBEL FILTERS Sobel filters are special 3x3 kernels used in image processing to detect edges in an image. They work by emphasizing differences in pixel values, which helps identify boundaries between objects (edges). There are two main types: 1.Horizontal Sobel Filter: Detects horizontal edges by looking for changes in pixel intensity along the vertical direction. 2. Vertical Sobel Filter: Detects vertical edges by looking for changes in pixel intensity along the horizontal direction.
  • 63. EQUIVARIANCE OF CONVOLUTION TO TRANSLATION Translational Equivariance or just equivariance is a very important property of the convolutional neural networks where the position of the object in the image should not be fixed for it to be detected by the CNN.
  • 64. EQUIVARIANCE OF CONVOLUTION TO TRANSLATION • Equivariant means that if the input changes, the output changes in the same way. • A function f(x) is equivariant to a function g if f(g(x))=g(f (x)) • If g is a function that translates the input, i.e., that shifts it, then the convolution function is equivariant to g. I(x,y) is image brightness at point (x,y) • I’=g(I) is image function with I’(x,y)=I(x-1,y), i.e., shifts every pixel of I one unit to the right • If we apply g to I and then apply convolution, the output will be the same as if we applied convolution to I’, then applied transformation g to the output.
  • 65. EQUIVARIANCE OF CONVOLUTION TO TRANSLATION • Equivariant means that if the input changes, the output changes in the same way. • A function f(x) is equivariant to a function g if f(g(x))=g(f (x)) • If g is a function that translates the input, i.e., that shifts it, then the convolution function is equivariant to g. I(x,y) is image brightness at point (x,y) • I’=g(I) is image function with I’(x,y)=I(x-1,y), i.e., shifts every pixel of I one unit to the right • If we apply g to I and then apply convolution, the output will be the same as if we applied convolution to I’, then applied transformation g to the output.
  • 66. CONVOLUTION: THE MATH BEHIND THE MATCH • Line up the feature and the image patch. • Multiply each image pixel by the corresponding feature/filter pixel (element-wise product). • Add them up. • Divide by the total number of pixels in the feature.
  • 67. FILTERING: THE MATH BEHIND THE MATCH A kernel (small grid) matches with a part of the image by multiplying their values. The green 2x2 kernel (1, 1, -1, 1) overlaps with a yellow 2x2 image patch (1, 1, 1, 1). Multiply each pair: 1×1 + 1×1 + (-1)×1 + 1×1 = 1 + 1 - 1 + 1 = 2. This result (2) shows how well the kernel matches the image patch, helping detect features like edges.
  • 68. FILTERING: THE MATH BEHIND THE MATCH After multiplying and adding the kernel (1, 1, -1, 1) with the image patch (1, 1, 1, 1), the result is 2. This value goes into a new grid called a feature map, shown in the blue box (top-left corner). As the kernel slides over the entire image, it fills the feature map with values, highlighting where patterns like edges are found.
  • 69. FILTERING: THE MATH BEHIND THE MATCH The kernel (1, 1, -1, 1) slides to the next image patch (1, -1, 1, 1). Multiply and add: 1×1 + 1×(-1) + (-1)×1 + 1×1 = 1 - 1 - 1 + 1 = 0. This result (0) is placed in the feature map, next to the previous value (1). The kernel keeps sliding to fill the feature map with values that show where patterns match.
  • 70. FILTERING: THE MATH BEHIND THE MATCH The kernel (1, 1, -1, 1) moves to the next image patch (1, -1, 1, 1). Multiply and add: 1×1 + 1×(-1) + (-1)×1 + 1×1 = 1 - 1 - 1 + 1 = 0. This result (0) is added to the feature map, following the previous values (1, 1). The kernel keeps sliding across the image to build the feature map, showing where patterns match.
  • 71. FILTERING: THE MATH BEHIND THE MATCH The kernel (1, 1, -1, 1) slides to the next image patch (1, 1, 1, 1). Multiply and add: 1×1 + 1×1 + (-1)×1 + 1×1 = 1 + 1 - 1 + 1 = 2. This result (2) is added to the feature map, following the previous values (1, 1, 1). The kernel keeps moving across the image to complete the feature map, revealing pattern matches.
  • 72. FILTERING: THE MATH BEHIND THE MATCH The kernel (1, 1, -1, 1) slides to the next image patch (1, 1, 1, 1). Multiply and add: 1×1 + 1×1 + (-1)×1 + 1×1 = 1 + 1 - 1 + 1 = 2. This result (2) is added to the feature map, following the previous values (1, 1, 1, 1). The kernel keeps sliding to fill the feature map, showing where patterns match.
  • 73. FILTERING: THE MATH BEHIND THE MATCH The kernel (1, 1, -1, 1) slides to the next image patch (1, 1, 1, 1). Multiply and add: 1×1 + 1×1 + (-1)×1 + 1×1 = 1 + 1 - 1 + 1 = 2. This result (2) is added to the feature map, following the previous values (1, 1, 1, 1, 1). The kernel keeps sliding to fill the feature map, showing where patterns match.
  • 74. FILTERING: THE MATH BEHIND THE MATCH A 3x3 kernel (1, -1, -1, 1, 1, -1, 1, -1, 1) overlaps with a 3x3 image patch (1, 1, 1, 1, 1, 1, 1, 1, 1). Multiply each pair and add: 1×1 + (-1)×1 + (-1)×1 + 1×1 + 1×1 + (-1)×1 + 1×1 + (-1)×1 + 1×1 = 1 - 1 - 1 + 1 + 1 - 1 + 1 - 1 + 1 = 1. Divide by 9 (total pixels in the kernel) to get 1. This value (1) goes into the feature map.
  • 75. CONVOLUTION: APPLYING A SHARPEN FILTER A 3x3 sharpen filter (-1, -1, -1, -1, 8, -1, -1, -1, -1) slides over a 5x5 image. For the green patch (21, 19, 17, 71, 76, 73, 153, 164, 164), multiply and add: (-1)×21 + (-1)×19 + (-1)×17 + (-1)×71 + 8×76 + (-1)×73 + (-1)×153 + (-1)×164 + (-1)×164 = - 21 - 19 - 17 - 71 + 608 - 73 - 153 - 164 - 164 = -74. This result (-74) goes into the feature map, highlighting edges in the image.
  • 76. FILTERS AND CONVOLUTIONAL FEATURE MAP • Filters (or kernels) are small grids of numbers used to find patterns in an image, like edges or textures. They slide over the image in a process called convolution, calculating new values at each step. This creates a feature map that highlights the patterns found. • In this example, two Sobel filters are used: one detects horizontal edges, and the other detects vertical edges. These feature maps help in understanding important parts of the image and are widely used in computer vision and deep learning.
  • 77. APPLYING A BANK OF FILTERS A filter bank contains multiple filters that detect different patterns in an image. When applied through convolution, they create feature maps that highlight edges, textures, and shapes. These filters are learned automatically during training to help the model understand the image.
  • 78. CONVOLUTION MULTIPLE CHANNELS convolution works with multiple input channels, such as a color image with Red, Green, and Blue (RGB) layers. Each filter is also made up of multiple layers, one for each channel. The filter is applied across all channels of the input, and the results are summed to produce a single value in the output feature map. This process is repeated for each filter to create multiple output channels. This technique helps capture more complex features by combining information from all color channels.
  • 79. CONVOLUTION MULTIPLE CHANNELS This example shows convolution with multiple channels using a 3D input volume of size 7×7×3 (height × width × channels). Each filter (W0 and W1) is also 3D (3×3×3) to match the input depth. 1.The input has 3 channels (like an RGB image), shown in blue boxes. 2. Filter W0 (in red) is applied across all 3 input channels. For each position: • Each 3×3 slice from the input is multiplied with the matching filter slice. • The results from all 3 slices are summed. • A bias (b0) is added to produce a single number. 3.This process is repeated across the image to produce one output channel (green numbers). 4.Similarly, Filter W1 produces a second output channel using the same steps. 5.The final output volume is 3×3×2, where 2 is the number of filters used.
  • 80. CNN APPLYING AN ACTIVATION FUNCTION For a neuron at position (p,q) in the hidden layer: Output(p,q) = • x(i+p)(j+q)​ : input values from a 4×4 patch • wij​ : weights from the filter • b: bias term • f: activation function (e.g., ReLU) This equation describes: 1.Taking a weighted sum of the input patch 2. Adding a bias 3. Passing the result through an activation function
  • 81. CONVOLUTION LAYER This is how a convolution layer works with a 3D input. The input image is of size 32×32×3 (e.g., RGB), and the filter size is 5×5×3. The filter always matches the depth of the input. It slides over the image spatially (width and height), computing dot products at each position. This operation helps extract meaningful features like edges and textures from the input.
  • 82. CONVOLUTION LAYER A 5×5×3 filter is applied to a 32×32×3 input image by taking a small 5×5×3 patch (75 values) and computing a dot product with the filter weights. After adding a bias, this gives one output value. This operation is repeated across the entire image. Mathematically, it's represented as:
  • 83. CONVOLUTION LAYER A 5×5×3 filter is applied to a 32×32×3 input image by taking a small 5×5×3 patch (75 values) and computing a dot product with the filter weights. After adding a bias, this gives one output value. This operation is repeated across the entire image. Mathematically, it's represented as: • w^T: Transpose of the weight vector. • x: Input vector. • b: Bias term
  • 84. CONVOLUTION LAYER A 5×5×3 filter is slid over a 32×32×3 input volume to compute dot products at each spatial location. Each computation produces a single number, and this operation is repeated across the entire input. The result is a 28×28×1 activation map, where the reduced spatial dimensions are due to the filter size and no padding being applied. This activation map captures local patterns from the input using the learned filter.
  • 85. CONVOLUTION LAYER By using a second 5×5×3 filter, an additional 28×28 activation map is generated. Each filter captures different features from the same input volume. Stacking the results from both filters creates multiple activation maps, allowing deeper representation of the input. This helps the network learn various patterns like edges, textures, or colors.
  • 86. CONVOLUTION LAYER Using 6 filters of size 5×5, each spanning the full depth of the input (3 channels), produces 6 distinct 28×28 activation maps. These maps are stacked depth-wise to form a new volume of size 28×28×6, which serves as the output of the convolution layer. Each filter learns to detect different features from the same input.
  • 87. POOLING LAYER Pooling layers in CNNs shrink the feature map’s width and height while keeping important features. A small window slides over the feature map, summarizing each region For a feature map with dimensions ​ the dimensions of the output after a pooling layer are: Types of Pooling Layers: • Max Pooling • Average Pooling
  • 88. SPATIAL DIMENSION - STRIDES Spatial Dimensional Strides" refers to how the filter moves across the input during convolution. The stride is the number of pixels the filter shifts at each step. It directly affects the size of the output (feature map) — larger strides result in smaller outputs.
  • 89. SPATIAL DIMENSION - STRIDES A 5×5×3 filter slides over a 32×32×3 input with a stride of 1. At each location, it computes a dot product, resulting in one value in the output. This process continues across all spatial positions, generating a 28×28 activation map. The reduction in size is due to the filter not being applied on the border pixels, which reduces the width and height by 4 (32 - 5 + 1 = 28).
  • 90. MAX POOLING: SHRINKING THE DIMENSION • Max Pooling shrinks a feature map by sliding a small window over it and taking the largest value in each region. It reduces the size, keeps important features, and makes the CNN faster and more efficient. • A 2x2 window slides over a 4x4 feature map with a stride of 2. In each region, the largest value is taken: [1, 1, 5, 6] becomes 6, [2, 4, 7, 8] becomes 8, [3, 2, 1, 2] becomes 3, and [1, 0, 3, 4] becomes 4. The result is a smaller 2x2 feature map [6, 8, 3, 4]. This reduces the size, keeps key features, and makes the CNN more efficient.
  • 91. AVERAGE POOLING: SHRINKING THE DIMENSION • Average Pooling shrinks a feature map by sliding a small window over it and taking the average value of each region, reducing size while summarizing features. • A 2x2 window slides over a 4x4 feature map with a stride of 2. In each region, the average value is taken: [4, 3, 1, 3] becomes 2.8, [1, 5, 4, 8] becomes 4.5, [4, 5, 6, 5] becomes 5.0, and [4, 3, 9, 4] becomes 5.0. The result is a smaller 2x2 feature map [2.8, 4.5, 5.0, 5.0]. This reduces the size, summarizes features, and makes the CNN more efficient.
  • 92. CONVOLUTION – REDUCTION IN FEATURE DIMENSION A 3x3 filter slides over a 4x4 input with padding to keep the output size the same. The input (4x4) is padded with zeros around the edges, making it effectively larger, and the filter processes each region to produce a 4x4 output. This maintains the size but extracts features, preparing the data for further steps like pooling.
  • 93. ‘VALID’ PADDING – REDUCTION IN FEATURE DIMENSION • Valid padding in convolution, focusing on how it reduces the output size compared to the input. It fits well with your previous convolution slides, showing a different padding approach. • A 2x2 filter slides over a 4x4 input with valid padding, meaning no extra zeros are added around the edges. With a stride of 1, the filter produces a 3x3 output: [1.25, 0.5, 0.5, 0.5, 0.75, 1.5, 0.25, 1.25, 1]. This reduces the size of the feature map while extracting key features for the CNN.
  • 94. ‘SAME’ PADDING – NO REDUCTION IN FEATURE DIMENSION • Same padding in convolution, focusing on how it maintains the output size. It fits well with your previous convolution slides, showing a different padding approach. • A 2x2 filter slides over a 4x4 input with same padding, adding zeros around the edges to keep the output size 4x4. With a stride of 1, the filter produces a 4x4 output: [0.5, 0, 0.25, 0.25, 0, 1.25, 0.5, 0.5, 0, 0.5, 0.75, 1.5, 0.5, 0.25, 1.25, 1]. This preserves the size while extracting features for the CNN.
  • 95. TYPICAL PROCESSING BLOCKS IN A CNN A CNN takes input feature maps, applies filters in the convolution layer to create convolution feature maps with features like edges, then uses the pooling layer to shrink them into smaller pooling feature maps. These layers work together to process images efficiently.
  • 97. CNN ARCHITECTURE A Convolutional Neural Network (CNN) is made up of layers that automatically learn features from images. It usually starts with convolution layers that apply filters to detect patterns like edges or textures. Then, pooling layers reduce the size of the data while keeping important information. After several layers of convolution and pooling, the output is flattened into a vector and passed through fully connected layers, which make the final prediction. CNNs are widely used in image classification and other computer vision tasks because they can learn to focus on important parts of the image.
  • 98. FULLY-CONNECTED LAYERS Fully-connected layers in a CNN are similar to those in a traditional neural network (MLP). They connect every neuron in one layer to every neuron in the next. After convolution and pooling layers extract features, the fully-connected layer takes these features, flattens them into a 1D vector, and uses them for final classification. This part of the network learns patterns and makes predictions like digit labels or object categories.
  • 100. CLASSIFICATION WITH FC LAYERS After the convolution and pooling layers we need to add a Fully Connected Layers at the end to enable the ANN learn complex patterns from the feature maps generated by the previous Convolutional Layers. • CNN layers act as feature extractor. • FC layer act as feature classifier.
  • 101. CONVNET (IMAGE CLASSIFICATION) = CNN FEATURE EXTRACTOR + FC LAYER FEATURE CLASSIFIER
  • 102. DROPOUT LAYER A Dropout layer helps prevent overfitting by randomly turning off some neurons during training. This forces the model to learn more robust features, improving its ability to generalize well on new, unseen data.
  • 103. PROJECT DEMO This project classifies images into predefined categories using a Convolutional Neural Network (CNN) trained on the Vehicle Image Classification Dataset. Vehicles Muticlass Classification Datase t: The Dataset consists of 7 classes Auto Rickshaws , Bikes, Cars, Motorcycles, Planes, Ships, Trains Total Images taken - 5590 Splitted into Training - 3906, Testing - 839 and Validation - 845 Note: Missing images may result from file corruption, naming conflicts, path errors, split issues, or preprocessing losses.
  • 105. Train/Validation/Test Split - Sample Code Images are split into train, validate, and test sets using train_test_split() with fixed random_state. Each image is copied into class-wise folders using shutil.copy2(), preserving structure for model training.
  • 106. Pre-processing - Sample Code ImageDataGenerator loads images from specified directories (flow_from_directory), applies rescale and target_size, and categorizes them into 7 classes as per class_mode.
  • 107. CNN - Sample Code Conv2D layers extract features, MaxPooling2D downsizes, Flatten prepares for Dense layers, Dropout reduces overfitting, and softmax outputs num_classes probabilities; Adam optimizes with categorical_crossentropy for multi-class classification.
  • 108. CNN - Sample Code Output The Model: "sequential" has layers: conv2d (32, 64, 128 filters), max_pooling2d, flatten, dense (128, 7 units), and dropout; total params: 4,829,319 (trainable params: 4,829,319).
  • 109. Training the Model with Data Generators - Sample Code model.fit() runs training on train_generator data for 20 epochs, evaluating performance on validation_data from validate_generator.
  • 110. Model Evaluation - Sample Code model.evaluate() assesses the model on test_generator, computing loss with categorical_crossentropy and accuracy as metrics, processing all batches (35/35).
  • 111. Plotting Training and Validation Metrics - Sample Code The code uses plt.figure(figsize=(12, 4)), plt.subplot(), and plt.plot() to graph history.history['accuracy'] (Train Acc), history.history['val_accuracy'] (Val Acc), history.history['loss'] (Train Loss), and history.history['val_loss'] (Val Loss) with plt.legend() and plt.title().
  • 112. Plotting Training and Validation Metrics - Sample Code output The Accuracy plot shows Train Acc rising to ~0.9, Val Acc fluctuating around 0.7-0.8, indicating overfitting; the Loss plot shows Train Loss dropping to ~0.2, Val Loss decreasing but unstable at ~0.4, suggesting inconsistent validation performance.
  • 113. Confusion Matrix - Sample Code confusion_matrix compares true_classes and predicted_classes, sns.heatmap visualizes errors with class_labels on axes, and classification_report provides precision, recall, and F1-score per class.
  • 114. Confusion Matrix - Sample Code Output The Confusion Matrix displays prediction accuracy for 7 classes; high diagonal values (e.g., 142 Auto Rickshaws, 159 Bikes) show correct predictions, but errors occur, like 26 Planes and 22 Trains misclassified as Ships.
  • 115. Confusion Report - Sample Code Output The Classification Report shows Bikes (0.97) and Motorcycles (0.91) with highest f1-scores, Planes (0.81) and Trains (0.83) lowest; overall accuracy is 0.87 for 1,116 samples.
  • 116. Testing - Sample Code It processes a single image for classification, rescaling and predicting its class as "Bikes" using the trained model. Predicted Correctly