SlideShare a Scribd company logo
MACHINE LEARNING
(22ISE62)
Module - 4
Dr. Shivashankar
Professor
Department of Information Science & Engineering
GLOBAL ACADEMY OF TECHNOLOGY-Bengaluru
27-06-2025 1
GLOBAL ACADEMY OF TECHNOLOGY
Ideal Homes Township, Rajarajeshwari Nagar, Bengaluru – 560 098
Department of Information Science & Engineering
Dr. Shivashankar-ISE-GAT
Course Outcomes
After Completion of the course, student will be able to:
22ISE62.1: Describe the machine learning techniques, their types and data analysis framework.
22ISE62.2: Apply mathematical concepts for feature engineering and perform dimensionality
reduction to enhance model performance.
22ISE62.3: Develop similarity-based learning models and regression models for solving
classification and prediction tasks.
22ISE62.4: Build probabilistic learning models and design neural network models using perceptron
and multilayer architectures.
22ISE62.5: Utilize clustering algorithms to identify patterns in data and implement reinforcement
learning techniques.
Text Book:
1. S Sridhar, M Vijayalakshmi, “Machine Learning”, OXFORD University Press 2021, First Edition.
2. Murty, M. N., and V. S. Ananthanarayana. Machine Learning: Theory and Practice, Universities Press, 2024.
3. T. M. Mitchell, “Machine Learning”, McGraw Hill, 1997.
4. Burkov, Andriy. The hundred-page machine learning book. Vol. 1. Quebec City, QC, Canada: Andriy Burkov,
2019.
27-06-2025 2
Dr. Shivashankar-ISE-GAT
Module 4: Bayesian Learning
• Bayesian Learning is a learning method that describes and represents knowledge in an uncertain
domain and provides a way to reason about this knowledge using probability measure.
• It uses Bayes theorem to infer the unknown parameters of a model.
• Bayesian inference is useful in many applications which involve reasoning and diagnosis such as game
theory, medicine, etc.
• Bayesian inference is much more powerful in handling missing data and for estimating any uncertainty
in predictions.
Introduction to Probability-based Learning
• Probability-based learning is one of the most important practical learning methods which combines
prior knowledge or prior probabilities with observed data.
• Probabilistic learning uses the concept of probability theory that describes how to model randomness,
uncertainty, and noise to predict future events.
• It is a tool for modelling large datasets and uses Bayes rule to infer unknown quantities, predict and
learn from data.
• In a probabilistic model, randomness plays a major role which gives probability distribution a solution,
while in a deterministic model there is no randomness and hence it exhibits the same initial conditions
every time the model is run and is likely to get a single possible outcome as the solution.
27-06-2025 3
Dr. Shivashankar-ISE-GAT
Fundamentals of Bayes Theorem
Bayes Theorem - A formula that describes the probability of an event, given that another event has
already occurred.
In machine learning, it's crucial for updating probabilities based on new evidence and making predictions in
situations where there's uncertainty.
Prior Probability
It is the general probability of an uncertain event before an observation is seen, or some evidence is
collected. It is the initial probability that is believed before any new information is collected.
Likelihood Probability
• Likelihood probability is the relative probability of the observation occurring for each class or the
sampling density for the evidence given the hypothesis.
• It is stated as P (Evidence | Hypothesis), which denotes the likeliness of the occurrence of the evidence
given the parameters.
Posterior Probability
• It is the updated or revised probability of an event taking into account the observations from the training
data. P (Hypothesis | Evidence) is the posterior distribution representing the belief about the hypothesis,
given the evidence from the training data.
• Therefore, Posterior probability = prior probability + new evidence
27-06-2025 4
Dr. Shivashankar-ISE-GAT
Classification Using Bayes Model
• It calculates the conditional probability of an event A given that event B has occurred (P(A|B)).
• Generally, Bayes theorem is used to select the most probable hypothesis from data, considering
both prior knowledge and posterior distributions.
• It is based on the calculation of the posterior probability and is stated as:
P (Hypothesis h | Evidence E)
• where, Hypothesis h is the target class to be classified and Evidence E is the given test instance.
• P (Hypothesis h| Evidence E) is calculated from the prior probability P (Hypothesis h), the
likelihood probability P (Evidence E |Hypothesis h) and the marginal probability P (Evidence E).
• It can be written as:
P (Hypothesis h | Evidence E) =
𝑃 𝐸𝑣𝑖𝑑𝑒𝑛𝑐𝑒 E|𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 ℎ 𝑃(𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 ℎ)
𝑃(𝐸𝑣𝑖𝑑𝑒𝑛𝑐𝑒 𝐸)
where, P (Hypothesis h) is the prior probability of the hypothesis h without observing the training
data or considering any evidence.
P (Evidence E | Hypothesis h) is the prior probability of Evidence E given Hypothesis h.
27-06-2025 5
Dr. Shivashankar-ISE-GAT
Conti..
Maximum A Posteriori (MAP) Hypothesis, ℎ𝑀𝐴𝑃 :
• Hypothesis is a proposed explanation or assumption about the relationship between input data
(features) and output predictions.
• It's a model or mapping that the algorithm uses to predict outcomes based on given inputs.
• This most probable hypothesis is called the Maximum A Posteriori Hypothesis ℎ𝑀𝐴𝑃. Bayes theorem Eq.
can be used to find the ℎ𝑀𝐴𝑃.
ℎ𝑀𝐴𝑃 = 𝑚𝑎𝑥ℎ𝜖𝐻 P(Hypothesish |Evidence E) = 𝑚𝑎𝑥ℎ𝜖𝐻 P(Evidenace E | Hypothesish h)
P(Hypothesish h)
Maximum Likelihood (ML) Hypothesis, ℎ𝑀𝐿 :
• Given a set of candidate hypotheses, if every hypothesis is equally probable, only P (E | h) is used to find
the most probable hypothesis.
• The hypothesis that gives the maximum likelihood for P (E | h) is called the Maximum Likelihood (ML)
Hypothesis, ℎ𝑀𝐿.
ℎ𝑀𝐿= 𝑚𝑎𝑥ℎ𝜖𝐻 P(Evidenace E | Hypothesish h)
27-06-2025 6
Dr. Shivashankar-ISE-GAT
Conti..
• Correctness of Bayes Theorem Consider two events A and B in a sample space S.
A: T F T T F T T F
B: F T T F T F T F
Solution:
P (A) = 5/8
P (B) = 4/8
P (A | B) = P(A∩B)/P(B) = 2/4
P (B | A) = P(B|A) = P(A∩B)/P(A) = 2/5
P (A | B) = P (B | A) P (A)/ P (B) = = 2/4
P (B | A) = P (A | B) P (B)/ P (A) = = 2/5
27-06-2025 7
Dr. Shivashankar-ISE-GAT
Conti..
• Problem 2: Consider a boy who has a volleyball tournament on the next day, but today he feels
sick. It is unusual that there is only a 40% chance he would fall sick since he is a healthy boy.
Now, find the probability of the boy participating in the tournament. The boy is very much
interested in volley ball, so there is a 90% probability that he would participate in tournaments
and 20% that he will fall sick given that he participates in the tournament.
• Solution:
P (Boy participating in the tournament) = 90%
P (He is sick | Boy participating in the tournament) = 20%
P (He is Sick) = 40%
The probability of the boy participating in the tournament given that he is sick is:
P (Boy participating in the tournament | He is sick) = P (Boy participating in the tournament) × P
(He is sick | Boy participating in the tournament)/P (He is Sick)
P (Boy participating in the tournament | He is sick) = (0.9 × 0.2)/0.4 = 0.45
Hence, 45% is the probability that the boy will participate in the tournament given that he is sick.
27-06-2025 8
Dr. Shivashankar-ISE-GAT
NAÏVE BAYES ALGORITHM
• It is a supervised binary class or multi class classification algorithm that works on the principle of Bayes
theorem.
• It's considered "naive" because it makes a strong, often unrealistic, assumption: that features are conditionally
independent given the class label.
• This means that the presence or absence of one feature doesn't affect the presence or absence of other
features, which simplifies the calculations.
• It particularly works for a large dataset and is very fast. It is one of the most effective and simple classification
algorithms.
• This algorithm considers all features to be independent of each other even though they are individually
dependent on the classified object.
• Each of the features contributes a probability value independently during classification and hence this
algorithm is called as Naïve algorithm.
• Some important applications of these algorithms are text classification, recommendation system and face
recognition.
Algorithm: Naïve Bayes
1. Compute the prior probability for the target class.
2. Compute Frequency matrix and likelihood Probability for each of the feature.
3. Use Bayes theorem Eq. (8.1) to calculate the probability of all hypotheses.
4. Use Maximum A Posteriori (MAP) Hypothesis, ℎ𝑀𝐴𝑃 to classify the test object
27-06-2025 9
Dr. Shivashankar-ISE-GAT
Conti..
Example 8.2: Assess a student’s performance using Naïve Bayes algorithm with the dataset
provided in Table 8.1. Predict whether a student gets a job offer or not in his final year of the
course.
Table 8.1: Training Dataset
.
27-06-2025 10
Dr. Shivashankar-ISE-GAT
Sl. No. CGPA interactiveness Practical
knowledge
Communication skill Job offer
1 ≥ 9 Yes Very good Good Yes
2 ≥ 8 No Good Moderate Yes
3 ≥ 9 No Average Poor No
4 ≥ 8 No Average Good No
5 ≥ 8 Yes Good Moderate Yes
6 ≥ 9 Yes Good Moderate Yes
7 ≥ 8 Yes Good Poor No
8 ≥ 9 No Very good Good Yes
9 ≥ 8 Yes Good Good Yes
10 ≥ 8 Yes Average Good Yes
Conti..
Solution:
Step 1: Compute the prior probability for the target feature ‘Job Offer’. The target feature ‘Job Offer’ has
two classes, ‘Yes’ and ‘No’. It is a binary classification problem. Given a student instance, we need to classify
whether ‘Job Offer = Yes’ or ‘Job Offer = No’.
From the training dataset, we observe that the frequency or the number of instances with ‘Job Offer = Yes’
is 7 and ‘Job Offer = No’ is 3.
The prior probability for the target feature is calculated by dividing the number of instances belonging to a
particular target class by the total number of instances.
Hence, the prior probability for ‘Job Offer = Yes’ is 7/10 and ‘Job Offer = No’ is 3/10 as shown in Table 8.2.
Table 8.2: Frequency Matrix and Prior Probability of Job Offer
Step 2: Compute Frequency matrix and Likelihood Probability for each of the feature.
Step 2(a): Feature – CGPA
27-06-2025 11
Dr. Shivashankar-ISE-GAT
Job offer classes No. of instances Probability
Yes 7 P(Job offer = Yes) = 7/10
No 3 P(Job offer = No) = 3/10
Conti..
Table 8.3 shows the frequency matrix for the feature CGPA.
Table 8.4 shows how the likelihood probability is calculated for CGPA using conditional probability.
Table 8.4: Likelihood Probability of CGPA.
From the Table 8.3 Frequency Matrix of CGPA, number of instances with ‘CGPA ≥9’ and ‘Job Offer = Yes’ is 3.
The total number of instances with ‘Job Offer = Yes’ is 7.
Hence, P (CGPA ≥9 | Job Offer = Yes) = 3/7.
27-06-2025 12
Dr. Shivashankar-ISE-GAT
CGPA Job offer = Yes Job offer = No
≥ 9 3 1
≥ 8 4 0
<8 0 2
Total 7 3
CGPA P(Job offer = Yes) P(Job offer = No)
≥ 9 P (CGPA ≥9 | Job Offer = Yes) = 3/7 P (CGPA ≥9 | Job Offer = No) = 1/3
≥ 8 P (CGPA ≥8 | Job Offer = Yes) = 4/7 P (CGPA ≥8 | Job Offer = No) = 0/3
<8 P (CGPA <8 | Job Offer = Yes) = 0/7 P (CGPA <8 | Job Offer = No) = 2/3
Conti..
• Step 2(b): Feature – Interactiveness
Table 8.5 shows the frequency matrix for the feature Interactiveness.
Table 8.5: Frequency Matrix of Interactiveness
Table 8.6 shows how the likelihood probability is calculated for Interactiveness using conditional probability.
Table 8.6: Likelihood Probability of Interactiveness
27-06-2025 13
Dr. Shivashankar-ISE-GAT
Interactiveness Job offer = Yes Job offer = No
Yes 5 1
No 2 2
Total 7 3
Interactiveness P(Job offer = Yes) P(Job offer = No)
Yes P (Interactiveness = Yes | Job Offer
= Yes) = 5/7
P (Interactiveness = Yes |
Job Offer = No) = 1/3
No P (Interactiveness = No | Job Offer
= Yes) = 2/7
P(Interactiveness = No |
Job Offer = No) = 2/3
Conti..
Step 2(c): Feature – Practical Knowledge
Table 8.7 shows the frequency matrix for the feature Practical Knowledge.
Table 8.7: Frequency Matrix of Practical Knowledge
Table 8.8: Likelihood Probability of Practical Knowledge
27-06-2025 14
Dr. Shivashankar-ISE-GAT
Practical knowledge Job offer = Yes Job offer = No
Very good 2 0
Average 1 2
Good 4 1
Total 7 3
Practical knowledge P(Job offer = Yes) P(Job offer = No)
Very good P (Practical Knowledge = Very Good |
Job Offer = Yes) = 2/7
P (Practical Knowledge = Very
Good | Job Offer = No) = 0/3
Average P (Practical Knowledge = Average |
Job Offer = Yes) = 1/7
P (Practical Knowledge =
Average | Job Offer = No) = 2/3
Good P (Practical Knowledge = Good | Job
Offer = Yes) = 4/7
P (Practical Knowledge = Good
| Job Offer = No) = 1/3
Conti..
Step 2(d): Feature – Communication Skills
Table 8.9 shows the frequency matrix for the feature Communication Skills.
Table 8.9: Frequency Matrix of Communication Skills
Table 8.10: Likelihood Probability of Communication Skills Communication Skills P (Job Offer = Yes)
27-06-2025 15
Dr. Shivashankar-ISE-GAT
Communication skill Job offer = Yes Job offer = No
Good 4 1
Moderate 3 0
Poor 0 2
Total 7 3
Communication skill P(Job offer = Yes) P(Job offer = No)
Good P (Communication Skills = Good | Job Offer
= Yes) = 4/7
P (Communication Skills = Good
| Job Offer = No) = 1/3
Moderate P (Communication Skills = Moderate | Job
Offer = Yes) = 3/7
P (Communication Skills =
Moderate | Job Offer = No) =
0/3
Poor P (Communication Skills = Poor | Job Offer
= Yes) = 0/7
P (Communication Skills = Poor
| Job Offer = No) = 2/3
Conti..
Step 3: Use Bayes theorem Eq.
P (Hypothesis h | Evidence E) =
𝑃 𝐸𝑣𝑖𝑑𝑒𝑛𝑐𝑒 E|𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 ℎ 𝑃(𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 ℎ)
𝑃(𝐸𝑣𝑖𝑑𝑒𝑛𝑐𝑒 𝐸)
to calculate the probability of all hypotheses.
P (Job Offer = Yes | Test data) = (P(CGPA ≥9 | Job Offer = Yes) P (Interactiveness = Yes | Job Offer = Yes) P (Practical
knowledge = Average | Job Offer = Yes) P (Communication Skills = Good | Job Offer = Yes) P (Job Offer = Yes)))/(P (Test
Data))
Given the test data = (CGPA ≥9, Interactiveness = Yes, Practical knowledge = Average, Communication Skills = Good).
Hence, P (Job Offer = Yes | Test data) = (P(CGPA ≥9 |Job Offer = Yes) P (Interactiveness = Yes | Job Offer = Yes) P
(Practical knowledge = Average | Job Offer = Yes) P (Communication Skills = Good | Job Offer = Yes) P (Job Offer = Yes)
= 3/7 × 5/7 × 1/7 × 4/7 × 7/10 = 0.0175
Similarly, for the other case ‘Job Offer = No’,
P (Job Offer = No| Test data) = (P(CGPA ≥9 |Job Offer =No) P (Interactiveness = Yes | Job Offer = No) P (Practical
knowledge = Average | Job Offer = No) P (Communication Skills = Good | Job Offer =No) P (Job Offer = No))/(P(Test
Data)).
P (CGPA ≥9 |Job Offer = No) P (Interactiveness = Yes | Job Offer = No) P (Practical knowledge = Average | Job Offer = No)
P (Communication Skills = Good | Job Offer = No) P (Job Offer = No)
= 1/3 × 1/3 × 2/3 × 1/3 × 3/10 = 0.0074
Step 4: Use Maximum A Posteriori (MAP) Hypothesis, ℎ𝑀𝐴𝑃 to classify the test object to the hypothesis with the highest
probability.
Since P (Job Offer = Yes | Test data) has the highest probability value (0.0175>0.0074, the test data is classified as ‘Job
Offer = Yes’.
27-06-2025 16
Dr. Shivashankar-ISE-GAT
Conti…
Zero Probability Error
• In the previous problem data set, consider the test data to be (CGPA ≥8, Interactiveness = Yes, Practical knowledge =
Average, Communication Skills = Good)
When computing the posterior probability,
P (Job Offer = Yes | Test data) = (P(CGPA ≥8 |Job Offer = Yes) P (Interactiveness = Yes | Job Offer = Yes) P (Practical knowledge =
Average | Job Offer = Yes) P (Communication Skills = Good | Job Offer = Yes) P (Job Offer = Yes)))/((P(Test Data))
P (Job Offer = Yes | Test data) = (P(CGPA ≥8 |Job Offer = Yes) P(Interactiveness = Yes | Job Offer = Yes) P (Practical knowledge =
Average | Job Offer = Yes) P (Communication Skills = Good | Job Offer = Yes) P (Job Offer = Yes)
= 4/7 × 5/7 × 1/7 × 4/7 × 7/10 = 0.0233
Similarly, for the other case ‘Job Offer = No’,
When we compute the probability:
P (Job Offer = No| Test data) = (P(CGPA ≥8 |Job Offer = No) P (Interactiveness = Yes | Job Offer = No) P (Practical knowledge =
Average | Job Offer = No) P (Communication Skills = Good | Job Offer = No) P (Job Offer = No))/(P(Test Data))
= P (CGPA ≥8 |Job Offer =No) P (Interactiveness = Yes | Job Offer = No) P (Practical knowledge = Average | Job Offer = No) P
(Communication Skills = Good | Job Offer =No) P (Job Offer = No)
= 0/3 × 1/3 × 2/3 × 1/3 × 3/10
= 0
Since the probability value is zero, the model fails to predict, and this is called as ZeroProbability error.
This problem arises because there are no instances in the given Table 8.1 for the attribute value CGPA ≥8 and Job Offer = No
and hence the probability value of this case is zero.
27-06-2025 17
Dr. Shivashankar-ISE-GAT
Conti..
This zero-probability error can be solved by applying a smoothing technique called Laplace correction which means given 1000 data instances in
the training dataset, if there are zero instances for a particular value of a feature we can add 1 instance for each attribute value pair of that
feature which will not make much difference for 1000 data instances and the overall probability does not become zero.
Table 8.11: Scaled Values to 1000 without Laplace Correction
Now, add 1 instance for each CGPA-value pair for ‘Job Offer = No’. Then, P (CGPA ≥9 | Job Offer = No) = 101/303 = 0.333 P (CGPA ≥8 | Job Offer =
No) = 1/303 = 0.0033 P (CGPA <8 | Job Offer = No) = 201/303 = 0.6634 With scaled values to 1003 data instances, we get P (Job Offer = Yes |
Test data) = (P(CGPA ≥8 |Job Offer = Yes) P (Interactiveness = Yes | Job Offer = Yes) P (Practical knowledge = Average | Job Offer = Yes) P
(Communication Skills = Good | Job Offer = Yes) P (Job Offer = Yes) = 400/700 × 500/700 × 100/700 × 400/700 × 700/1003 = 0.02325
27-06-2025 18
Dr. Shivashankar-ISE-GAT
CGPA (Job Offer = Yes) P (Job Offer = No)
≥9 P (CGPA ≥9 | Job Offer = Yes) = 300/700 P (CGPA ≥9 | Job Offer = No) = 100/300
≥8 P (CGPA ≥8 | Job Offer = Yes) = 400/700 P (CGPA ≥8 | Job Offer = No) = 0/300
<8 P (CGPA <8 | Job Offer = Yes) = 0/700 P (CGPA <8 | Job Offer = No) = 200/300
Problem 1: Apply the naive Bayes classifier to a concept learning problem, classifying days according to
whether someone will play tennis {outlook=sunny, temperature=cool, humidity=high. Wind=strong}
6/27/2025 19
Dr. Shivashankar, ISE, GAT
Day Outlook Temperature Humidity Wind Play_Tennis
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild high Strong No
Cont…
Problem 2: Estimate the conditional probabilities of each attributes {color, legs, height, smelly} for the species classes
{M,H} using the data set given in the table. Using these probabilities estimate the probability values for the new instance
{color=green, legs=2, height=tall and smelly=No}.
6/27/2025 20
Dr. Shivashankar, ISE, GAT
No Color Legs Height Smelly Species
1 White 3 Short Yes M
2 Green 2 Tall No M
3 Green 3 Short Yes M
4 White 3 Short Yes M
5 Green 2 Short No H
6 White 2 Tall No H
7 White 2 Tall No H
8 White 2 Short Yes H
Bayes Optimal Classifier
• Bayes optimal classifier is a probabilistic model, which in fact, uses the Bayes theorem to find the
most probable classification for a new instance given the training data by combining the
predictions of all posterior hypotheses,
• This is different from Maximum A Posteriori (MAP) Hypothesis, ℎ𝑀𝐴𝑃 which chooses the
maximum probable hypothesis or the most probable hypothesis.
• Here, a new instance can be classified to a possible classification value
𝐶𝑖 = 𝑚𝑎𝑥𝐶𝑖
σℎ𝑖𝜖𝐻 𝑃( 𝐶𝑖I ℎ𝑖) P(ℎ𝑖I T)
27-06-2025 21
Dr. Shivashankar-ISE-GAT
Conti..
• Example 8.3: Given the hypothesis space with 4 hypothesis ℎ1, ℎ2, ℎ3 and ℎ4. Determine if the patient is
diagnosed as COVID positive or COVID negative using Bayes Optimal classifier.
Table 8.12: Posterior Probability Values
Solution: ℎ𝑀𝐴𝑃 chooses ℎ1 which has the maximum probability value 0.3 as the solution and gives the result
that the patient is COVID negative. But Bayes Optimal classifier combines the predictions of ℎ2, ℎ3 and ℎ4
which is 0.4 and gives the result that the patient is COVID positive.
σℎ𝑖𝜖𝐻 P(COVID Negative I ℎ𝑖) P(ℎ𝑖I T) = 0.3 X 1 = 0.3
σℎ𝑖𝜖𝐻 P(COVID Negative I ℎ𝑖) P(ℎ𝑖I T) = 0.1 X 1 + 0.2 X 1 + 0.1 X 1 = 0.4
Therefore, 𝑚𝑎𝑥𝑐𝑖[𝐶𝑂𝑉𝐼𝐷 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒,𝐶𝑂𝑉𝐼𝐷 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒] σℎ𝑖𝜖𝐻 𝑃( 𝐶𝑖I ℎ𝑖) P(ℎ𝑖I T) = COVID Positive
Thus, this algorithm, diagnoses the new instance to be COVID positive.
27-06-2025 22
Dr. Shivashankar-ISE-GAT
P(ℎ𝑖𝐼 𝑇) P(COVID Positive I ℎ𝑖) P(COVID Negative I ℎ𝑖)
0.3 0 1
0.1 1 0
0.2 1 0
0.1 1 0
NAÏVE BAYES ALGORITHM FOR CONTINUOUS ATTRIBUTES
1. Gaussian Naive Bayes:
• This approach assumes that the probability distribution of each continuous attribute is a
Gaussian (normal) distribution.
• It calculates the probability of a given feature value belonging to a specific class based on the
Gaussian probability density function (PDF).
• The mean (μ) and standard deviation (σ) of the Gaussian distribution are estimated from the
training data for each class.
• The likelihood of a feature value x given a class y is calculated using the Gaussian formula:
P(x|y) = (1 / (σ * sqrt(2π))) * 𝑒𝑥𝑝
(
(𝑥− μ)2
(2∗σ)2
Where:
P(x|y) is the likelihood of feature value x given class y
μ is the mean of the Gaussian distribution for class y
σ is the standard deviation of the Gaussian distribution for class y
exp is the exponential function
π = 3.14159
27-06-2025 23
Dr. Shivashankar-ISE-GAT
Conti..
2. Discretization:
• Continuous attributes can be converted into discrete categories by creating intervals or bins.
• For example, a temperature value could be classified into categories like "low", "medium", or "high".
• Different discretization methods can be used, such as:
• Equal-width binning: Dividing the range of the attribute into equal-sized intervals.
• Equal-frequency binning: Dividing the range into intervals such that each interval contains the same
number of data points.
• Quartiles: Dividing the data into four groups based on percentiles (25th, 50th, 75th).
• Once discretized, the attributes can be treated as discrete variables in the Naive Bayes algorithm
27-06-2025 24
Dr. Shivashankar-ISE-GAT
Conti..
Problem 1: Assess a student’s performance using Naïve Bayes algorithm for the continuous attribute.
Predict whether a student gets a job offer or not in his final year of the course. The training dataset T
consists of 10 data instances with attributes such as ‘CGPA’ and ‘Interactiveness’ as shown in Table 8.13. The
target variable is Job Offer which is classified as Yes or No for a candidate student.
Table 8.13: Training Dataset with Continuous Attribute
27-06-2025 25
Dr. Shivashankar-ISE-GAT
Sl.N0. CGPA Interactiveness Job offer
1 9.5 Yes Yes
2 8.2 No Yes
3 9.3 No No
4 7.6 No No
5 8.4 Yes Yes
6 9.1 Yes Yes
7 7.5 Yes No
8 9.6 No Yes
9 8.6 Yes Yes
10 8.3 Yes Yes
Conti..
• Solution: Step 1: Compute the prior probability for the target feature ‘Job Offer’.
• Table 8.14: Prior Probability of Target Class Job Offer Classes No. of Instances
• Step 2: Compute Frequency matrix and Likelihood Probability for each of the feature.
• Gaussian distribution for continuous feature is calculated using the given formula,
P(𝑋𝑖 = 𝑥𝑘 𝐼 𝐶𝑗) = g(𝑥𝑘, 𝜇𝑖𝑗𝜎𝑖𝑗)
where, 𝑋𝑖is the 𝑖𝑡ℎ continuous attribute in the given dataset and 𝑥𝑘 is a value of the attribute.
𝐶𝑗 denotes the j th class of the target feature. 𝜇𝑖𝑗 denotes the mean of the values of that continuous attribute 𝑋𝑖
with respect to the class j of the target feature. 𝜎𝑖𝑗 denotes the standard deviation of the values of that
continuous attribute 𝑋𝑖 with respect to the class j of the target feature. Hence, the normal distribution formula is
given as:
P(𝑋𝑖 = 𝑥𝑘 𝐼 𝐶𝑗) =
1
𝜎𝑖𝑗 2𝜋
𝑒
−
𝑥𝑘−𝜇𝑖𝑗
2
2𝜎𝑖𝑗
2
27-06-2025 26
Dr. Shivashankar-ISE-GAT
Job offer classes No. of instances Probability value
Yes 7 P (Job Offer = Yes) = 7/10
No 3 P (Job Offer = No) = 3/10
Conti..
Step 2(a): Consider the feature CGPA
To calculate the likelihood probability for this continuous attribute, first compute the mean and standard
deviation for CGPA with respect to the target class ‘Job Offer’.
Here, 𝑋𝑖 = CGPA
𝐶𝑗 = ‘Job Offer = Yes’ Mean and Standard Deviation for class ‘Job Offer = Yes’ are given as:
Mean = µCGPA − YES = (9.5+8.2+8.4+9.1+9.6+8.6+8.3)/7 = 8.814286
σij = σCGPA − YES =
𝑥𝑖−µ 2
𝑁−1
= 0.58146
Mean and Standard Deviation for class ‘Job Offer = No’ are given as:
Cj = ‘Job Offer = No’
µij = µCGPA − NO = 8.13333
σij = σCGPA − NO = 1.011599
Once Mean and Standard Deviation are computed, the likelihood probability for any test value using Gaussian
distribution formula can be calculated.
.
27-06-2025 27
Dr. Shivashankar-ISE-GAT
Conti..
Step 2(b): Consider the feature Interactiveness
Table 8.15: Frequency Matrix of Interactiveness
Table 8.16 shows how the likelihood probability is calculated for Interactiveness using conditional
probability.
Table 8.16: Likelihood Probability of Interactiveness
27-06-2025 28
Dr. Shivashankar-ISE-GAT
Interactiveness Job offer = Yes Job offer = No
Yes 5 1
No 2 2
Total 7 3
Interactiveness P(Job offer = Yes) P(Job offer = No)
Yes P (Interactiveness = Yes |
Job Offer = Yes) = 5/7
P (Interactiveness = Yes |
Job Offer = No) = 1/3
No P (Interactiveness = No |
Job Offer = Yes) = 2/7
P (Interactiveness = No |
Job Offer = No) = 2/3
Conti..
Step 3: Use Bayes theorem to calculate the probability of all hypotheses.
Consider the test data to be (CGPA = 8.5, Interactiveness = Yes).
For the hypothesis ‘Job Offer = Yes’:
P (Job Offer = Yes | Test data) = (P(CGPA = 8.5 | Job Offer = Yes) × P (Interactiveness = Yes | Job Offer = Yes) ×
P (Job Offer = Yes)
To compute P (CGPA = 8.5 | Job Offer = Yes) use Gaussian distribution formula:
P(𝑋𝑖 = 𝑥𝑘 𝐼 𝐶𝑗) = g(𝑥𝑘, 𝜇𝑖𝑗𝜎𝑖𝑗)
P(𝑋𝐶𝐺𝑃𝐴 = 8.5| 𝐶𝐽𝑜𝑏 𝑜𝑓𝑓𝑒𝑟=𝑌𝑒𝑠) =
1
𝜎𝐶𝐺𝑃𝐴−𝑌𝐸𝑆 2𝜋
𝑒
−
8.5−𝜇𝐶𝐺𝑃𝐴−𝑌𝐸𝑆
2
2∗𝜎𝐶𝐺𝑃𝐴−𝑌𝐸𝑆
2
P(CGPA = 8.5 |Job offer = Yes) = g(𝑥𝑘 = 8.5, 𝜇𝑖𝑗 = 8.814, 𝜎𝑖𝑗 = 0.581) =
1
0.581 2𝜋
𝑒
−
8.5−8.814 2
2𝑋0.5812
= 0.594
P (Interactiveness = Yes|Job Offer = Yes) = 5/7
P (Job Offer = Yes) = 7/10
Hence: P (Job Offer = Yes | Test data) = (P(CGPA = 8.5 | Job Offer = Yes) × P (Interactiveness = Yes|Job Offer
= Yes) × P (Job Offer = Yes) = 0.594 × 5/7 × 7/10 = 0.297
27-06-2025 29
Dr. Shivashankar-ISE-GAT
Conti..
Similarly, for the hypothesis ‘Job Offer = No’:
P (Job Offer = No | Test data) = P (CGPA = 8.5 | Job Offer = No) × P (Interactiveness = Yes | Job Offer = No) ×
P (Job Offer = No)
P(CGPA = 8.5 |Job offer = No) = g(𝑥𝑘 = 8.5, 𝜇𝑖𝑗 = 8.133, 𝜎𝑖𝑗 = 1.0116)
P(𝑋𝐶𝐺𝑃𝐴 = 8.5| 𝐶𝐽𝑜𝑏 𝑜𝑓𝑓𝑒𝑟=𝑁𝑜) =
1
𝜎𝐶𝐺𝑃𝐴−𝑁𝑜 2𝜋
𝑒
−
8.5−𝜇𝐶𝐺𝑃𝐴−𝑁𝑜
2
2𝜎𝐶𝐺𝑃𝐴−𝑁𝑜2
=
1
1.0116 2𝜋
𝑒
−
8.5−8.133 2
2𝑋1.01162
= 0.369
P (Interactiveness = Yes | Job Offer = No) = 1/3
P (Job Offer = No) = 0.369
Hence,
P (Job Offer = No | Test data) = P (CGPA = 8.5 | Job Offer = No) P (Interactiveness = Yes | Job Offer = No) × P
(Job Offer = No) = 0.369 × 1/3 × 3/10 = 0.0369
Step 4: Use Maximum A Posteriori (MAP) Hypothesis, ℎ𝑀𝐴𝑃 to classify the test object to the hypothesis with
the highest probability.
Since P (Job Offer = Yes | Test data) has the highest probability value of 0.297,the test data is classified as
‘Job Offer = Yes’.
27-06-2025 30
Dr. Shivashankar-ISE-GAT
Conti..
Problem 2: Take a real-time example of predicting the result of a student using Naïve Bayes algorithm. The
training dataset T consists of 8 data instances with attributes such as ‘Assessment’, ‘Assignment’, ‘Project’ and
‘Seminar’ as shown in Table 8.17. The target variable is Result which is classified as Pass or Fail for a candidate
student. Given a test data to be (Assessment = Average, Assignment = Yes, Project = No and Seminar = Good),
predict the result of the student.
Table 8.17: Training Dataset
P(Pass | data) ≈ 0.008
P(Fail | data) ≈ 0.00617
• Prediction: Pass
27-06-2025 31
Dr. Shivashankar-ISE-GAT
Sl. No. Assessment Assignment Project Seminar Result
1 Good Yes Yes Good Pass
2 Average Yes No Poor Fail
3 Good No Yes Good Pass
4 Average No No Poor Fail
5 Average No Yes Good Pass
6 Good No No Poor Pass
7 Average Yes Yes Good Fail
8 Good Yes Yes Poor Pass
Conti..
Problem 3: Take a real-time example of predicting the result of a stolen using Naïve Bayes algorithm. The
training dataset T consists of 10data instances with attributes such as ‘color’, ‘type’ and ‘origin’ as shown in Table.
The target variable is stolen which is classified as yes or No.
Table 4.12: Dataset
Example No. Color Type Origin Stolen
1 Red Sports Domestic Yes
2 Red Sports Domestic No
3 Red Sports Domestic Yes
4 Yellow Sports Domestic No
5 Yellow Sports Imported Yes
6 Yellow SUV Imported No
7 Yellow SUV Imported Yes
8 Yellow SUV Domestic No
9 Red SUV Imported No
10 Red Sports Imported Yes
27-06-2025 32
Dr. Shivashankar-ISE-GAT
Artificial Neural Networks
• A neural network is a type of machine learning algorithm inspired by the human brain.
• It's a powerful tool that excels at solving complex problems more difficult for traditional computer algorithms to
handle, such as image recognition and natural language processing.
• Artificial Neural Networks (ANNs) are computational models inspired by the biological neural networks of the
human brain.
• They are used in machine learning to analyze data and make predictions by processing information through
interconnected nodes, or "neurons".
• The human brain constitutes a mass of neurons that are all connected as a network, which is actually a directed
graph.
• These neurons are the processing units which receive information, process it and then transmit this data to
other neurons that allows humans to learn almost any task.
• ANN is a learning mechanism that models a human brain to solve any non-linear and complex problem. Each
neuron is modelled as a computing unit, or simply called as a node in ANN, that is capable of doing complex
calculations.
• ANN is a system that consists of many such computing units operating in parallel that can learn from
observations.
• Some typical applications of ANN in the field of computer science are Natural Language Processing (NLP),
pattern recognition, face recognition, speech recognition, character recognition, text processing, stock
prediction, computer vision, etc.
• ANNs also have been considerably used in other engineering fields such as Chemical industry, Medicine,
Robotics, Communications, Banking, and Marketing.
27-06-2025 33
Dr. Shivashankar-ISE-GAT
Conti..
• The human nervous system has billions of neurons that are the processing units which make
humans to perceive things, to hear, to see and to smell.
• It makes us to remember, recognize and correlate things around us. It is a learning system that
consists of functional units called nerve cells, typically called as neurons.
• The human nervous system is divided into two sections called the Central Nervous System (CNS)
and the Peripheral Nervous System (PNS).
• The brain and the spinal cord constitute the CNS and the neurons inside and outside the CNS
constitute the PNS. The neurons are basically classified into three types called sensory neurons,
motor neurons and interneurons.
• Sensory neurons get information from different parts of the body and bring it into the CNS,
whereas motor neurons receive information from other neurons and transmit commands to the
body parts.
• The CNS consists of only interneurons which connect one neuron to another neuron by
receiving information from one neuron and transmitting it to another.
• The basic functionality of a neuron is to receive information, process it and then transmit it to
another neuron or to a body part.
27-06-2025 34
Dr. Shivashankar-ISE-GAT
Biological Neurons
A typical biological neuron has four parts called dendrites, soma, axon and synapse.
The body of the neuron is called as soma.
• Dendrites accept the input information and process it in the cell body called soma.
• A single neuron is connected by axons to around 10,000 neurons and through these axons the processed
information is passed from one neuron to another neuron.
• A neuron gets fired if the input information crosses a threshold value and transmits signals to another
neuron through a synapse.
• A synapse gets fired with an electrical impulse called spikes which are transmitted to another neuron.
• A single neuron can receive synaptic inputs from one neuron or multiple neurons.
• These neurons form a network structure which processes input information and gives out a response.
Figure 10.1: A Biological Neuron
27-06-2025 35
Dr. Shivashankar-ISE-GAT
Artificial Neurons
• Artificial neurons are like biological neurons which are called as nodes.
• A node or a neuron can receive one or more input information and process it.
• Artificial neurons or nodes are connected by connection links to one another.
• Each connection link is associated with a synaptic weight.
Figure 10.2: Artificial Neurons
27-06-2025 36
Dr. Shivashankar-ISE-GAT
Simple Model of an Artificial Neuron
The first mathematical model of a biological neuron was designed by McCulloch & Pitts in 1943. It includes
two steps:
1. It receives weighted inputs from other neurons
2. It operates with a threshold function or activation function
• The received inputs are computed as a weighted sum which is given to the activation function and if the
sum exceeds the threshold value the neuron gets fired.
• The neuron is the basic processing unit that receives a set of inputs 𝑥1, 𝑥2, 𝑥3,… 𝑥𝑛and their associated
weights 𝑤1, 𝑤2, 𝑤3,…. 𝑤𝑛.
• The Summation function ‘Net-sum’ Eq. (10.1) computes the weighted sum of the inputs received by the
neuron.
𝑁𝑒𝑡𝑆𝑢𝑚 = ෍
𝑖=1
𝑛
𝑥𝑖𝑤𝑖
The activation function is a binary step function which outputs a value 1 if the Net-sum is above the
threshold value q, and a 0 if the Net-sum is below the threshold value q.
Therefore, the activation function is applied to Net-sum as shown in below equation.
f(x) = Activation function (Net - sum)
Then, output of a neuron 𝛾 = ቊ
1 𝑖𝑓 𝑓 𝑥 ≥ 𝜃
0 𝑖𝑓 𝑓 𝑥 < 𝜃
27-06-2025 37
Dr. Shivashankar-ISE-GAT
Artificial Neural Network Structure
• Artificial Neural Network (ANN) imitates a human brain which inhibits some intelligence.
• It has a network structure represented as a directed graph with a set of neuron nodes and connection links or edges
connecting the nodes.
• The nodes in the graph are arrayed in a layered manner and can process information in parallel. The network given in
the figure has three layers called input layer, hidden layer and output layer. The input layer receives the input
information (𝑥1, 𝑥2, 𝑥3,… 𝑥𝑛) and passes it to the nodes in the hidden layer.
• The edges connecting the nodes from the input layer to the hidden layer are associated with synaptic weights called
as connection weights.
• These computing nodes or neurons perform some computations based on the input information (𝑥1, 𝑥2, 𝑥3,… 𝑥𝑛)
received and if the weighted sum of the inputs to a neuron is above the threshold or the activation level of the
neuron, then the neuron fires.
• Each neuron employs an activation function that determines the output of the neuron.
Figure 10.4: Artificial Neural Network Structure
27-06-2025 38
Dr. Shivashankar-ISE-GAT
Activation Function
Activation functions are mathematical functions associated with each neuron in the neural network that map input
signals to output signals.
It decides whether to fire a neuron or not based on the input signals the neuron receives.
These functions normalize the output value of each neuron either between 0 and 1 or between -1 and +1.
Below are some of the activation functions used in ANNs:
1. Identity Function or Linear Function f(x) = x ∀x
The value of f(x) increases linearly or proportionally with the value of x. This function is useful when we do not want to
apply any threshold. The output would be just the weighted sum of input values. The output value ranges between -∞
and +∞.
2. Binary Step Function:
f(x)= ቊ
1 𝑖𝑓 𝑓 𝑥 ≥ 𝜃
0 𝑖𝑓 𝑓 𝑥 < 𝜃
The output value is binary, i.e., 0 or 1 based on the threshold value q. If value of f(x) is greater than or equal to q, it
outputs 1 or else it outputs 0.
3. Bipolar Step Function:
f(x)= ቊ
1 𝑖𝑓 𝑓 𝑥 ≥ 𝜃
0 𝑖𝑓 𝑓 𝑥 < 𝜃
The output value is bipolar, i.e., +1 or -1 based on the threshold value q. If value of f(x) is greater than or equal to q, it
outputs +1 or else it outputs -1.
.
27-06-2025 39
Dr. Shivashankar-ISE-GAT
Conti..
4. Sigmoidal Function or Logistic Function
𝜎(x) =
1
1+𝑒−𝑥
It is a widely used non-linear activation function which produces an S-shaped curve and the output values
are in the range of 0 and 1.
5. Bipolar Sigmoid Function
𝜎(x) =
1−𝑒−𝑥
1+𝑒−𝑥
It outputs values between -1 and +1.
6. Ramp Functions
f(x)= ൞
1 𝑖𝑓 𝑥 > 1
𝑥 𝑖𝑓 0 ≤ 𝑥 ≤ 1𝑓 𝑥 < 𝜃
0 𝑖𝑓 𝑥 < 0
It is a linear function whose upper and lower limits are fixed.
7. Tanh – Hyperbolic Tangent Function
The Tanh function is a scaled version of the sigmoid function which is also non-linear. It also suffers from the
vanishing gradient problem. The output values range between -1 and 1.
tan h(x) =
1
1+𝑒−2𝑥 − 1
27-06-2025 40
Dr. Shivashankar-ISE-GAT
Conti..
8. ReLu – Rectified Linear Unit Function
This activation function is a typical function generally used in deep learning neural network models in the
hidden layers. It avoids or reduces the vanishing gradient problem. This function outputs a value of 0 for
negative input values and works like a linear function if the input values are positive.
r(x) = max (0, x) = f(x)= ቊ
𝑥 𝑖𝑓 𝑥 ≥ 𝜃
0 𝑖𝑓 𝑥 < 𝜃
9. Softmax Function
This is a non-linear function used in the output layer that can handle multiple classes.
It calculates the probability of each target class which ranges between 0 and 1.
The probability of the input belonging to a particular class is computed by dividing the exponential of the
given input value by the sum of the exponential values of all the inputs.
𝑆 𝑥𝑖 =
𝑒𝑥𝑖
σ𝑗=0
𝑘
𝑒𝑥𝑗
27-06-2025 41
Dr. Shivashankar-ISE-GAT
PERCEPTRON AND LEARNING THEORY
• A perceptron is a fundamental unit in neural networks, essentially a model of a biological neuron.
• It's a binary classifier that takes multiple inputs, applies weights and a bias, and then uses an activation
function to produce a single output, typically 0 or 1.
• The perceptron algorithm learns by adjusting the weights to minimize the error between its prediction
and the desired output.
The perceptron model consists of 4 steps:
1. Inputs from other neurons
2. Weights and bias
3. Net sum
4. Activation function
The summation function ‘Net-sum’ Eq.
computes the weighted sum of the inputs received by the neuron.
𝑁𝑒𝑡𝑆𝑢𝑚 = σ𝑖=1
𝑛
𝑥𝑖𝑤𝑖
27-06-2025 42
Dr. Shivashankar-ISE-GAT
Conti..
After computing the ‘Net-sum’, bias value is added to it and inserted in the activation function as shown below:
f(x) = Activation function (Net-sum + bias)
The activation function is a binary step function which outputs a value 1 if f(x) is above the threshold value q, and
a 0 if f(x) is below the threshold value q.
Then, output of a neuron: f(x)= ቊ
1 𝑖𝑓 𝑓 𝑥 ≥ 𝜃
0 𝑖𝑓 𝑓 𝑥 < 𝜃
Set initial weights 𝑤1, 𝑤2, 𝑤3, … . . 𝑤𝑛 and bias 𝜃 to a random value in the range [-0.5, 0.5].
For each Epoch,
1. Compute the weighted sum by multiplying the inputs with the weights and add the products.
2. Apply the activation function on the weighted sum:
Y = Step ((𝑥1𝑤1 + 𝑥2𝑤2) – q)
3. If the sum is above the threshold value, output the value as positive else output the value as negative.
4. Calculate the error by subtracting the estimated output 𝑌𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 from the desired output 𝑌𝑑𝑒𝑠𝑖𝑔𝑒𝑟𝑒𝑑:
Error e(t) = 𝑌𝑑𝑒𝑠𝑖𝑔𝑒𝑟𝑒𝑑 - 𝑌𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑
5. Update the weights if there is an error: ∆𝑤𝑖 = 𝛼 𝑋 𝑒 𝑡 𝑋 𝑥𝑖,
where, 𝑥𝑖 is the input value, e(t) is the error at step t, 𝛼 is the learning rate and ∆𝑤𝑖 is the difference in weight
that has to be added to 𝑤𝑖.
27-06-2025 43
Dr. Shivashankar-ISE-GAT
Conti..
• Problem 1: Consider a perceptron to represent the Boolean function AND with the initial weights 𝑤1 =
0.3, 𝑤2 = -0.2, learning rate ∝ = 0.2 and bias 𝜃 = 0.4 as shown in Figure 4.5. The activation function used
here is the Step function f(x) which gives the output value as binary, i.e., 0 or 1. If value of f(x) is greater
than or equal to 0, it outputs 1 or else it outputs 0. Design a perceptron that performs the Boolean
function AND and update the weights until the Boolean function gives the desired output.
Figure 4.5: Perceptron for Boolean function AND.
27-06-2025 44
Dr. Shivashankar-ISE-GAT
Conti..
• Solution: Desired output for Boolean function AND is shown in Table 4.1.
Table 4.1: AND Truth Table
Table 4.2: Epoch 1
27-06-2025 45
Dr. Shivashankar-ISE-GAT
𝑥1 𝑥2 𝑌𝑑𝑒𝑠
0 0 0
0 1 0
1 0 0
1 1 1
Epoch 𝑥1 𝑥2 𝑌𝑑𝑒𝑠 𝑌𝑒𝑠𝑡 Error 𝑤1 𝑤2 Status
1 0 0 0 Step ((0 × 0.3 + 0 × -0.2) – 0.4) = 0 0 0.3 -0.2 No change
0 1 0 Step ((0 × 0.3 + 1 × -0.2) – 0.4) = 0 0 0.3 -0.2 No change
1 0 0 Step ((1 × 0.3 + 0 × -0.2) – 0.4) = 0 0 0.3 -0.2 No change
1 1 1 Step ((1 × 0.3 + 1 × -0.2) – 0.4) = 0 1 0.5 0 Change
Conti..
For input (1, 1) the weights are updated as follows:
∆𝑤𝑖 = 𝛼 𝑋 𝑒 𝑡 𝑋 𝑥1 = 0.2 × 1 × 1 = 0.2
𝑤1 = 𝑤1 +∆𝑤1= 0.3 + 0.2 = 0.5
∆𝑤2 = 𝛼 𝑋 𝑒 𝑡 𝑋 𝑥2 = 0.2 × 1 × 1 = 0.2
𝑤2 = 𝑤2 +∆𝑤2= -0.2 + 0.2 = 0
Table 4.3: Epoch 2
27-06-2025 46
Dr. Shivashankar-ISE-GAT
Epoch 𝑥1 𝑥2 𝑌𝑑𝑒𝑠 𝑌𝑒𝑠𝑡 Error 𝑤1 𝑤2 Status
2 0 0 0 Step ((0 × 0.5 + 0 × 0) – 0.4) = 0 0 0.5 0 No change
0 1 0 Step ((0 × 0.5 + 1 × 0) – 0.4) = 0 0 0.5 0 No change
1 0 0 Step ((1 × 0.5 + 0 × 0) – 0.4) = 1 -1 0.3 0 change
1 1 1 Step ((1 × 0.3 + 1 × 0) – 0.4) = 0 1 0.5 0.2 Change
Conti..
For input (1, 0) the weights are updated as follows:
∆𝑤1 = 𝛼 𝑋 𝑒 𝑡 𝑋 𝑥1 = 0.2 × -1 × 1 = -0.2
𝑤1 = 𝑤1 +∆𝑤1= 0.5 - 0.2 = 0.3
∆𝑤2 = 𝛼 𝑋 𝑒 𝑡 𝑋 𝑥2 = 0.2 × -1 × 0 = 0
𝑤2 = 𝑤2 +∆𝑤2= 0 + 0 = 0
For input (1, 1), the weights are updated as follows:
∆𝑤𝑖 = 𝛼 𝑋 𝑒 𝑡 𝑋 𝑥1 = 0.2 × 1 × 1 = 0.2
𝑤1 = 𝑤1 +∆𝑤1= 0.3 + 0.2 = 0.5
∆𝑤2 = 𝛼 𝑋 𝑒 𝑡 𝑋 𝑥2 = 0.2 × 1 × 1 = 0.2
𝑤2 = 𝑤2 +∆𝑤2= 0 + 0.2 = 0.2
Table 4.4: Epoch 3
27-06-2025 47
Dr. Shivashankar-ISE-GAT
Epoch 𝑥1 𝑥2 𝑌𝑑𝑒𝑠 𝑌𝑒𝑠𝑡 Error 𝑤1 𝑤2 Status
3 0 0 0 Step ((0 × 0.5 + 0 × 0.2) – 0.4) = 0 0 0.5 0.2 No change
0 1 0 Step ((0 × 0.5 + 1 × 0.2) – 0.4) = 0 0 0.5 0.2 No change
1 0 0 Step ((1 × 0.5 + 0 × 0.2) – 0.4) = 1 -1 0.3 0.2 change
1 1 1 Step ((1 × 0.3 + 1 × 0.2) – 0.4) = 1 0 0.3 0.2 No change
Conti..
For input (1, 0) the weights are updated as follows:
∆𝑤1 = 𝛼 𝑋 𝑒 𝑡 𝑋 𝑥1 = 0.2 × -1 × 1 = -0.2
𝑤1 = 𝑤1 +∆𝑤1= 0.5 - 0.2 = 0.3
∆𝑤2 = 𝛼 𝑋 𝑒 𝑡 𝑋 𝑥2 = 0.2 × -1 × 0 = 0
𝑤2 = 𝑤2 +∆𝑤2= 0.2 + 0 = 0.2
Table 10.5: Epoch 4
It is observed that with 4 Epochs, the perceptron learns and the weights are updated to 0.3 and 0.2 with
which the perceptron gives the desired output of a Boolean AND function.
27-06-2025 48
Dr. Shivashankar-ISE-GAT
Epoch 𝑥1 𝑥2 𝑌𝑑𝑒𝑠 𝑌𝑒𝑠𝑡 Error 𝑤1 𝑤2 Status
4 0 0 0 Step ((0 × 0.3 + 0 × 0.2) – 0.4) = 0 0 0.3 0.2 No change
0 1 0 Step ((0 × 0.3 + 1 × 0.2) – 0.4) = 0 0 0.3 0.2 No change
1 0 0 Step ((1 × 0.3 + 0 × 0.2) – 0.4) = 0 0 0.3 0.2 No change
1 1 1 Step ((1 × 0.3 + 1 × 0.2) – 0.4) = 1 0 0.3 0.2 No change
Problem
Problem 1: Assume 𝑤1 = 0.6 𝑎𝑛𝑑 𝑤2 = 0.6, 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 = 1 𝑎𝑛𝑑 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒 ƞ=0.5.
Compute OR gate using perceptron training rule.
Solution :
1. A=0, B=0 and target=0
𝑤𝑖𝑥𝑖 = 𝑤1𝑥1+𝑤2𝑥2
=0.6*0+0.6*0=0
This is not greater than the threshold value of 1.
So the output =0
2. A=0, B=1 and target =1
𝑤𝑖𝑥𝑖 = 0.6 ∗ 0 +0.6*1= 0.6
This is not greater than the threshold value of 1. So the output =0.
𝑤𝑖=𝑤𝑖+ƞ(t-o) 𝑥𝑖
𝑤1=0.6+0.5(1-0)0=0.6
𝑤2=0.6+0.5(1-0)1=1.1
Now 𝒘𝟏=0.6, 𝒘𝟐=1.1, threshold = 1 and learning rate ƞ=0.5
6/27/2025 49
Dr. Shivashankar, ISE, GAT
A B Y=A+B
(Target)
0 0 0
0 1 1
1 0 1
1 1 1
Problem
• Problem 4: Consider NAND gate, compute Perceptron training rule with W1=1.2,
W2=0.6 threshold =-1 and learning rate=1.5.
• Solution:
6/27/2025 50
Dr. Shivashankar, ISE, GAT
A B Y=𝐴. 𝐵
0 0 1
0 1 1
1 0 1
1 1 0
Delta Learning Rule and Gradient Descent
• Generally, learning in neural networks is performed by adjusting the network weights in order to
minimize the difference between the desired and estimated outputs.
• This delta difference is measured as an error function or also called as cost function.
• The cost function, being linear and continuous, is differentiable.
• This way of learning called as delta rule (also known as Widrow-Hoff” rule or Adaline rule) is a
type of back propagation applied for training the network.
• The training error of a hypothesis is half the squared difference between the desired target
output and actual output and is given as follows:
Training Error =
1
2
σ𝑑𝑒𝑇 𝑂𝑑𝑒𝑠𝑖𝑟𝑒𝑑 − 𝑂𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑
2
where, T is the training dataset, 𝑂𝑑𝑒𝑠𝑖𝑟𝑒𝑑 and 𝑂𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 are the desired target output and
estimated actual output, respectively, for a training instance d.
The principle of gradient descent is an optimization.
• Gradient descent learning is the foundation of back propagation algorithm used in MLP.
• Before we study about an MLP, let us first understand the different types of neural networks that
differ in their structure, activation function and learning mechanism.
27-06-2025 51
Dr. Shivashankar-ISE-GAT
TYPES OF ARTIFICIAL NEURAL NETWORKS
• ANNs consist of multiple neurons arranged in layers.There are different types of ANNs that differ by the
network structure, activation function involved and the learning rules used.
• In an ANN, there are three layers called input layer, hidden layer and output layer.
• Any general ANN would consist of one input layer, one output layer and zero or more hidden layers.
1. Feed Forward Neural Network
• This is the simplest neural network that consists of neurons which are arranged in layers and the information is
propagated only in the forward direction.
• This model may or may not contain a hidden layer and there is no back propagation.
• Based on the number of hidden layers they are further classified into single-layered and multi-layered feed
forward networks.
• These ANNs are simple to design and easy to maintain.
• They are fast but cannot be used for complex learning.
• They are used for simple classification and simple image
processing, etc.
Figure 10.7: Model of a Feed Forward Neural Network
27-06-2025 52
Dr. Shivashankar-ISE-GAT
Conti..
2 Fully Connected Neural Network
• A fully connected neural network, also known as a dense or feedforward neural network, is an artificial
neural network where each neuron in one layer is connected to every neuron in the subsequent layer.
• Information flows unidirectionally, from input to output, without any loops or feedback connections.
Figure 8: Model of a Fully Connected Neural Network
27-06-2025 53
Dr. Shivashankar-ISE-GAT
Conti..
3. Multi-Layer Perceptron (MLP)
• This ANN consists of multiple layers with one input layer, one output layer and one or more hidden layers.
• Every neuron in a layer is connected to all neurons in the next layer and thus they are fully connected.
• The information flows in both the directions. In the forward direction, the inputs are multiplied by
weights of neurons and forwarded to the activation function of the neuron and output is passed to the
next layer.
• If the output is incorrect, then in the backward direction, error is back propagated to adjust the weights
and biases to get correct output.
• Thus, the network learns with the training data.
• This type of ANN is used in deep learning for complex
• classification, speech recognition, medical diagnosis, forecasting, etc.
Figure 10.9: Model of a Multi-Layer Perceptron
27-06-2025 54
Dr. Shivashankar-ISE-GAT
Feedback Neural Network
• Feedback neural networks have feedback connections between neurons that allow information flow in
both directions in the network.
• The output signals can be sent back to the neurons in the same layer or to the neurons in the preceding
layers.
• Hence, this network is more dynamic during training.
• It allows the network to learn from its previous outputs and adapt to dynamic environments.
• This iterative process, where outputs are reused as inputs, enables the network to refine its performance
over time and improve its ability to handle complex tasks or changing data.
Figure 10: Model of a Feedback Neural Network
27-06-2025 55
Dr. Shivashankar-ISE-GAT
POPULAR APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS
ANN learning mechanisms are used in many complex applications that involve modelling of non-
linear processes.
ANN is a useful model that can handle even noisy and incomplete data.
They are used to model complex patterns, recognize patterns and solve prediction problems like
humans in many areas such as:
1. Real-time applications: Face recognition, emotion detection, self-driving cars, navigation
systems, routing systems, target tracking, vehicle scheduling, etc.
2. Business applications: Stock trading, sales forecasting, customer behaviour modelling, Market
research and analysis, etc.
3. Banking and Finance: Credit and loan forecasting, fraud and risk evaluation, currency price
prediction, real-estate appraisal, etc.
4. Education: Adaptive learning software, student performance modelling, etc.
5. Healthcare: Medical diagnosis or mapping symptoms to a medical case, image interpretation
and pattern recognition, drug discovery, etc.
6. Other Engineering Applications: Robotics, aerospace, electronics, manufacturing,
communications, chemical analysis, food research, etc.
27-06-2025 56
Dr. Shivashankar-ISE-GAT
ADVANTAGES AND DISADVANTAGES OF ANN
Advantages of ANN
1. ANN can solve complex problems involving non-linear processes.
2. ANNs can learn and recognize complex patterns and solve problems as humans solve a
problem.
3. ANNs have a parallel processing capability and can predict in less time.
4. They have an ability to work with inadequate knowledge. It can even handle incomplete and
noisy data.
They can scale well to larger data sets and outperforms other learning mechanisms.
Limitations of ANN
1. An ANN requires processors with parallel processing capability to train the network running for
many epochs. The function of each node requires a CPU capability which is difficult for very large
networks with a large amount of data.
2. They work like a ‘black box’ and it is exceedingly difficult to understand their working in inner
layers. Moreover, it is hard to understand the relationship between the representations learned at
each layer.
27-06-2025 57
Dr. Shivashankar-ISE-GAT
CHALLENGES OF ARTIFICIAL NEURAL NETWORKS
The major challenges while modelling a real-time application with ANNs are:
1. Training a neural network is the most challenging part of using this technique. Overfitting or
underfitting issues may arise if datasets used for training are not correct. It is also hard to
generalize to the real-world data when trained with some simulated data. Moreover, neural
network models normally need a lot of training data to be robust and are usable for a real-
time application.
2. Finding the weight and bias parameters
27-06-2025 58
Dr. Shivashankar-ISE-GAT

More Related Content

PDF
Module - 5 Machine Learning-22ISE62.pdf
Dr. Shivashankar
 
PDF
Dr. Shivu___Machine Learning_Module 2pdf
Dr. Shivashankar
 
PDF
Dr. Shivu__Machine Learning-Module 3.pdf
Dr. Shivashankar
 
PDF
Machine Learning_2025_First Module_1.pdf
Dr. Shivashankar
 
PDF
22ISE52_COMPUTER NETWORKS _Module 1+.pdf
Dr. Shivashankar
 
PDF
Dr Shivu_GAT_Computer Network_22ISE52_Module 4.pdf
Dr. Shivashankar
 
PDF
DrShivashankar_Computer Net_Module-3.pdf
Dr. Shivashankar
 
PPTX
Sign Language Recognition based on Hands symbols Classification
Triloki Gupta
 
Module - 5 Machine Learning-22ISE62.pdf
Dr. Shivashankar
 
Dr. Shivu___Machine Learning_Module 2pdf
Dr. Shivashankar
 
Dr. Shivu__Machine Learning-Module 3.pdf
Dr. Shivashankar
 
Machine Learning_2025_First Module_1.pdf
Dr. Shivashankar
 
22ISE52_COMPUTER NETWORKS _Module 1+.pdf
Dr. Shivashankar
 
Dr Shivu_GAT_Computer Network_22ISE52_Module 4.pdf
Dr. Shivashankar
 
DrShivashankar_Computer Net_Module-3.pdf
Dr. Shivashankar
 
Sign Language Recognition based on Hands symbols Classification
Triloki Gupta
 

What's hot (20)

PDF
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
PDF
Decision Tree-ID3,C4.5,CART,Regression Tree
Sharmila Chidaravalli
 
ODP
Introduction to Bayesian Statistics
Philipp Singer
 
PPTX
Instance based learning
Slideshare
 
PPT
2.6 support vector machines and associative classifiers revised
Krish_ver2
 
ODP
Introduction to Principle Component Analysis
Sunjeet Jena
 
PDF
HPPS: Heart Problem Prediction System using Machine Learning
Nimai Chand Das Adhikari
 
PDF
An introduction to Bayesian Statistics using Python
freshdatabos
 
PDF
Bayesian inference
CharthaGaglani
 
PPTX
Probability Theory for Data Scientists
Ferdin Joe John Joseph PhD
 
PDF
Machine Learning_SVM_KNN_K-MEANSModule 2.pdf
Dr. Shivashankar
 
PPTX
Naive Bayes Presentation
Md. Enamul Haque Chowdhury
 
PDF
Naive Bayes
CloudxLab
 
PPT
Regression analysis
Sohag Babu
 
PDF
Module 4_Machine Learning_Evaluating Hyp
Dr. Shivashankar
 
PDF
Heart Attack Prediction using Machine Learning
mohdshoaibuddin1
 
PDF
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
Edureka!
 
PPTX
Knn Algorithm presentation
RishavSharma112
 
PPTX
Lect4 principal component analysis-I
hktripathy
 
PDF
Discriminant Analysis-lecture 8
Laila Fatehy
 
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
Decision Tree-ID3,C4.5,CART,Regression Tree
Sharmila Chidaravalli
 
Introduction to Bayesian Statistics
Philipp Singer
 
Instance based learning
Slideshare
 
2.6 support vector machines and associative classifiers revised
Krish_ver2
 
Introduction to Principle Component Analysis
Sunjeet Jena
 
HPPS: Heart Problem Prediction System using Machine Learning
Nimai Chand Das Adhikari
 
An introduction to Bayesian Statistics using Python
freshdatabos
 
Bayesian inference
CharthaGaglani
 
Probability Theory for Data Scientists
Ferdin Joe John Joseph PhD
 
Machine Learning_SVM_KNN_K-MEANSModule 2.pdf
Dr. Shivashankar
 
Naive Bayes Presentation
Md. Enamul Haque Chowdhury
 
Naive Bayes
CloudxLab
 
Regression analysis
Sohag Babu
 
Module 4_Machine Learning_Evaluating Hyp
Dr. Shivashankar
 
Heart Attack Prediction using Machine Learning
mohdshoaibuddin1
 
Naive Bayes Classifier in Python | Naive Bayes Algorithm | Machine Learning A...
Edureka!
 
Knn Algorithm presentation
RishavSharma112
 
Lect4 principal component analysis-I
hktripathy
 
Discriminant Analysis-lecture 8
Laila Fatehy
 
Ad

Similar to Module - 4 Machine Learning -22ISE62.pdf (20)

PDF
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET Journal
 
PDF
NBaysian classifier, Naive Bayes classifier
ShivarkarSandip
 
PDF
Bayesian regression algorithm for machine learning
SivaSankar306103
 
PDF
Module 3_Machine Learning Bayesian Learn
Dr. Shivashankar
 
PPT
Unit-2.ppt
AshwaniShukla47
 
PPT
Unit-4 classification
LokarchanaD
 
PDF
DIFFERENCE OF PROBABILITY AND INFORMATION ENTROPY FOR SKILLS CLASSIFICATION A...
ijaia
 
PPTX
Calculus in Machine Learning
Gokul Jayan
 
PDF
Regression, Bayesian Learning and Support vector machine
Dr. Radhey Shyam
 
PPTX
Module 4_F.pptx
SupriyaN21
 
PPTX
Data Analytics with Data Science Algorithm
kumari36
 
PDF
Bayesian data analysis1
SaritaTripathy3
 
PPTX
Core Principles and Applications of Statistics.pptx
Statistics Homework Help
 
PPT
Gene expression profiling ii
Prasanthperceptron
 
PPT
Classification
Ankita Kadu
 
PPTX
Introduction to Naive bayes and baysian belief network
Kp Sharma
 
PPTX
CST413 KTU S7 CSE Machine Learning Supervised Learning Classification Algorit...
resming1
 
DOCX
Week 6 Assignment 2Application Chi-Square Study.docx
melbruce90096
 
PPTX
Module 4 part_1
ShashankN22
 
PPT
4_22865_IS465_2019_1__2_1_08ClassBasic.ppt
TSANKARARAO
 
IRJET- Big Data and Bayes Theorem used Analyze the Student’s Performance in E...
IRJET Journal
 
NBaysian classifier, Naive Bayes classifier
ShivarkarSandip
 
Bayesian regression algorithm for machine learning
SivaSankar306103
 
Module 3_Machine Learning Bayesian Learn
Dr. Shivashankar
 
Unit-2.ppt
AshwaniShukla47
 
Unit-4 classification
LokarchanaD
 
DIFFERENCE OF PROBABILITY AND INFORMATION ENTROPY FOR SKILLS CLASSIFICATION A...
ijaia
 
Calculus in Machine Learning
Gokul Jayan
 
Regression, Bayesian Learning and Support vector machine
Dr. Radhey Shyam
 
Module 4_F.pptx
SupriyaN21
 
Data Analytics with Data Science Algorithm
kumari36
 
Bayesian data analysis1
SaritaTripathy3
 
Core Principles and Applications of Statistics.pptx
Statistics Homework Help
 
Gene expression profiling ii
Prasanthperceptron
 
Classification
Ankita Kadu
 
Introduction to Naive bayes and baysian belief network
Kp Sharma
 
CST413 KTU S7 CSE Machine Learning Supervised Learning Classification Algorit...
resming1
 
Week 6 Assignment 2Application Chi-Square Study.docx
melbruce90096
 
Module 4 part_1
ShashankN22
 
4_22865_IS465_2019_1__2_1_08ClassBasic.ppt
TSANKARARAO
 
Ad

More from Dr. Shivashankar (20)

PDF
Dr Shivu_GAT_Computer Network_Module 5.pdf
Dr. Shivashankar
 
PPTX
22ISE52_Computer Networks_Module _2.pptx
Dr. Shivashankar
 
PDF
5th Module_Machine Learning_Reinforc.pdf
Dr. Shivashankar
 
PDF
Machine Learning- Perceptron_Backpropogation_Module 3.pdf
Dr. Shivashankar
 
PDF
21 Scheme_21EC53_MODULE-5_CCN_Dr. ShivaS
Dr. Shivashankar
 
PDF
21 SCHEME_21EC53_VTU_MODULE-4_COMPUTER COMMUNCATION NETWORK.pdf
Dr. Shivashankar
 
PDF
21 Scheme_ MODULE-3_CCN.pdf
Dr. Shivashankar
 
PDF
21_Scheme_MODULE-1_CCN.pdf
Dr. Shivashankar
 
PDF
21 Scheme_MODULE-2_CCN.pdf
Dr. Shivashankar
 
PDF
Network Security_Dr Shivashankar_Module 5.pdf
Dr. Shivashankar
 
PDF
Wireless Cellular Communication_Module 3_Dr. Shivashankar.pdf
Dr. Shivashankar
 
PDF
Wireless Cellular Communication_Mudule2_Dr.Shivashankar.pdf
Dr. Shivashankar
 
PDF
Network Security_4th Module_Dr. Shivashankar
Dr. Shivashankar
 
PDF
Network Security_3rd Module_Dr. Shivashankar
Dr. Shivashankar
 
PDF
Network Security_Module_2_Dr Shivashankar
Dr. Shivashankar
 
PDF
Network Security-Module_1.pdf
Dr. Shivashankar
 
PDF
Network Security_Module_2.pdf
Dr. Shivashankar
 
PPTX
MODULE-3_CCN.pptx
Dr. Shivashankar
 
PPTX
MODULE-1_CCN.pptx
Dr. Shivashankar
 
PPTX
MODULE-2_CCN.pptx
Dr. Shivashankar
 
Dr Shivu_GAT_Computer Network_Module 5.pdf
Dr. Shivashankar
 
22ISE52_Computer Networks_Module _2.pptx
Dr. Shivashankar
 
5th Module_Machine Learning_Reinforc.pdf
Dr. Shivashankar
 
Machine Learning- Perceptron_Backpropogation_Module 3.pdf
Dr. Shivashankar
 
21 Scheme_21EC53_MODULE-5_CCN_Dr. ShivaS
Dr. Shivashankar
 
21 SCHEME_21EC53_VTU_MODULE-4_COMPUTER COMMUNCATION NETWORK.pdf
Dr. Shivashankar
 
21 Scheme_ MODULE-3_CCN.pdf
Dr. Shivashankar
 
21_Scheme_MODULE-1_CCN.pdf
Dr. Shivashankar
 
21 Scheme_MODULE-2_CCN.pdf
Dr. Shivashankar
 
Network Security_Dr Shivashankar_Module 5.pdf
Dr. Shivashankar
 
Wireless Cellular Communication_Module 3_Dr. Shivashankar.pdf
Dr. Shivashankar
 
Wireless Cellular Communication_Mudule2_Dr.Shivashankar.pdf
Dr. Shivashankar
 
Network Security_4th Module_Dr. Shivashankar
Dr. Shivashankar
 
Network Security_3rd Module_Dr. Shivashankar
Dr. Shivashankar
 
Network Security_Module_2_Dr Shivashankar
Dr. Shivashankar
 
Network Security-Module_1.pdf
Dr. Shivashankar
 
Network Security_Module_2.pdf
Dr. Shivashankar
 
MODULE-3_CCN.pptx
Dr. Shivashankar
 
MODULE-1_CCN.pptx
Dr. Shivashankar
 
MODULE-2_CCN.pptx
Dr. Shivashankar
 

Recently uploaded (20)

PDF
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PPTX
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
PDF
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
PPTX
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
PDF
CAD-CAM U-1 Combined Notes_57761226_2025_04_22_14_40.pdf
shailendrapratap2002
 
PPTX
MT Chapter 1.pptx- Magnetic particle testing
ABCAnyBodyCanRelax
 
PDF
AI-Driven IoT-Enabled UAV Inspection Framework for Predictive Maintenance and...
ijcncjournal019
 
PDF
Zero Carbon Building Performance standard
BassemOsman1
 
PDF
STUDY OF NOVEL CHANNEL MATERIALS USING III-V COMPOUNDS WITH VARIOUS GATE DIEL...
ijoejnl
 
PDF
FLEX-LNG-Company-Presentation-Nov-2017.pdf
jbloggzs
 
PPTX
Tunnel Ventilation System in Kanpur Metro
220105053
 
PDF
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
hatem173148
 
PDF
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
PDF
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
PPT
Understanding the Key Components and Parts of a Drone System.ppt
Siva Reddy
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PDF
Biodegradable Plastics: Innovations and Market Potential (www.kiu.ac.ug)
publication11
 
PPT
1. SYSTEMS, ROLES, AND DEVELOPMENT METHODOLOGIES.ppt
zilow058
 
PPTX
Chapter_Seven_Construction_Reliability_Elective_III_Msc CM
SubashKumarBhattarai
 
Packaging Tips for Stainless Steel Tubes and Pipes
heavymetalsandtubes
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
CAD-CAM U-1 Combined Notes_57761226_2025_04_22_14_40.pdf
shailendrapratap2002
 
MT Chapter 1.pptx- Magnetic particle testing
ABCAnyBodyCanRelax
 
AI-Driven IoT-Enabled UAV Inspection Framework for Predictive Maintenance and...
ijcncjournal019
 
Zero Carbon Building Performance standard
BassemOsman1
 
STUDY OF NOVEL CHANNEL MATERIALS USING III-V COMPOUNDS WITH VARIOUS GATE DIEL...
ijoejnl
 
FLEX-LNG-Company-Presentation-Nov-2017.pdf
jbloggzs
 
Tunnel Ventilation System in Kanpur Metro
220105053
 
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
hatem173148
 
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
Understanding the Key Components and Parts of a Drone System.ppt
Siva Reddy
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
Biodegradable Plastics: Innovations and Market Potential (www.kiu.ac.ug)
publication11
 
1. SYSTEMS, ROLES, AND DEVELOPMENT METHODOLOGIES.ppt
zilow058
 
Chapter_Seven_Construction_Reliability_Elective_III_Msc CM
SubashKumarBhattarai
 

Module - 4 Machine Learning -22ISE62.pdf

  • 1. MACHINE LEARNING (22ISE62) Module - 4 Dr. Shivashankar Professor Department of Information Science & Engineering GLOBAL ACADEMY OF TECHNOLOGY-Bengaluru 27-06-2025 1 GLOBAL ACADEMY OF TECHNOLOGY Ideal Homes Township, Rajarajeshwari Nagar, Bengaluru – 560 098 Department of Information Science & Engineering Dr. Shivashankar-ISE-GAT
  • 2. Course Outcomes After Completion of the course, student will be able to: 22ISE62.1: Describe the machine learning techniques, their types and data analysis framework. 22ISE62.2: Apply mathematical concepts for feature engineering and perform dimensionality reduction to enhance model performance. 22ISE62.3: Develop similarity-based learning models and regression models for solving classification and prediction tasks. 22ISE62.4: Build probabilistic learning models and design neural network models using perceptron and multilayer architectures. 22ISE62.5: Utilize clustering algorithms to identify patterns in data and implement reinforcement learning techniques. Text Book: 1. S Sridhar, M Vijayalakshmi, “Machine Learning”, OXFORD University Press 2021, First Edition. 2. Murty, M. N., and V. S. Ananthanarayana. Machine Learning: Theory and Practice, Universities Press, 2024. 3. T. M. Mitchell, “Machine Learning”, McGraw Hill, 1997. 4. Burkov, Andriy. The hundred-page machine learning book. Vol. 1. Quebec City, QC, Canada: Andriy Burkov, 2019. 27-06-2025 2 Dr. Shivashankar-ISE-GAT
  • 3. Module 4: Bayesian Learning • Bayesian Learning is a learning method that describes and represents knowledge in an uncertain domain and provides a way to reason about this knowledge using probability measure. • It uses Bayes theorem to infer the unknown parameters of a model. • Bayesian inference is useful in many applications which involve reasoning and diagnosis such as game theory, medicine, etc. • Bayesian inference is much more powerful in handling missing data and for estimating any uncertainty in predictions. Introduction to Probability-based Learning • Probability-based learning is one of the most important practical learning methods which combines prior knowledge or prior probabilities with observed data. • Probabilistic learning uses the concept of probability theory that describes how to model randomness, uncertainty, and noise to predict future events. • It is a tool for modelling large datasets and uses Bayes rule to infer unknown quantities, predict and learn from data. • In a probabilistic model, randomness plays a major role which gives probability distribution a solution, while in a deterministic model there is no randomness and hence it exhibits the same initial conditions every time the model is run and is likely to get a single possible outcome as the solution. 27-06-2025 3 Dr. Shivashankar-ISE-GAT
  • 4. Fundamentals of Bayes Theorem Bayes Theorem - A formula that describes the probability of an event, given that another event has already occurred. In machine learning, it's crucial for updating probabilities based on new evidence and making predictions in situations where there's uncertainty. Prior Probability It is the general probability of an uncertain event before an observation is seen, or some evidence is collected. It is the initial probability that is believed before any new information is collected. Likelihood Probability • Likelihood probability is the relative probability of the observation occurring for each class or the sampling density for the evidence given the hypothesis. • It is stated as P (Evidence | Hypothesis), which denotes the likeliness of the occurrence of the evidence given the parameters. Posterior Probability • It is the updated or revised probability of an event taking into account the observations from the training data. P (Hypothesis | Evidence) is the posterior distribution representing the belief about the hypothesis, given the evidence from the training data. • Therefore, Posterior probability = prior probability + new evidence 27-06-2025 4 Dr. Shivashankar-ISE-GAT
  • 5. Classification Using Bayes Model • It calculates the conditional probability of an event A given that event B has occurred (P(A|B)). • Generally, Bayes theorem is used to select the most probable hypothesis from data, considering both prior knowledge and posterior distributions. • It is based on the calculation of the posterior probability and is stated as: P (Hypothesis h | Evidence E) • where, Hypothesis h is the target class to be classified and Evidence E is the given test instance. • P (Hypothesis h| Evidence E) is calculated from the prior probability P (Hypothesis h), the likelihood probability P (Evidence E |Hypothesis h) and the marginal probability P (Evidence E). • It can be written as: P (Hypothesis h | Evidence E) = 𝑃 𝐸𝑣𝑖𝑑𝑒𝑛𝑐𝑒 E|𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 ℎ 𝑃(𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 ℎ) 𝑃(𝐸𝑣𝑖𝑑𝑒𝑛𝑐𝑒 𝐸) where, P (Hypothesis h) is the prior probability of the hypothesis h without observing the training data or considering any evidence. P (Evidence E | Hypothesis h) is the prior probability of Evidence E given Hypothesis h. 27-06-2025 5 Dr. Shivashankar-ISE-GAT
  • 6. Conti.. Maximum A Posteriori (MAP) Hypothesis, ℎ𝑀𝐴𝑃 : • Hypothesis is a proposed explanation or assumption about the relationship between input data (features) and output predictions. • It's a model or mapping that the algorithm uses to predict outcomes based on given inputs. • This most probable hypothesis is called the Maximum A Posteriori Hypothesis ℎ𝑀𝐴𝑃. Bayes theorem Eq. can be used to find the ℎ𝑀𝐴𝑃. ℎ𝑀𝐴𝑃 = 𝑚𝑎𝑥ℎ𝜖𝐻 P(Hypothesish |Evidence E) = 𝑚𝑎𝑥ℎ𝜖𝐻 P(Evidenace E | Hypothesish h) P(Hypothesish h) Maximum Likelihood (ML) Hypothesis, ℎ𝑀𝐿 : • Given a set of candidate hypotheses, if every hypothesis is equally probable, only P (E | h) is used to find the most probable hypothesis. • The hypothesis that gives the maximum likelihood for P (E | h) is called the Maximum Likelihood (ML) Hypothesis, ℎ𝑀𝐿. ℎ𝑀𝐿= 𝑚𝑎𝑥ℎ𝜖𝐻 P(Evidenace E | Hypothesish h) 27-06-2025 6 Dr. Shivashankar-ISE-GAT
  • 7. Conti.. • Correctness of Bayes Theorem Consider two events A and B in a sample space S. A: T F T T F T T F B: F T T F T F T F Solution: P (A) = 5/8 P (B) = 4/8 P (A | B) = P(A∩B)/P(B) = 2/4 P (B | A) = P(B|A) = P(A∩B)/P(A) = 2/5 P (A | B) = P (B | A) P (A)/ P (B) = = 2/4 P (B | A) = P (A | B) P (B)/ P (A) = = 2/5 27-06-2025 7 Dr. Shivashankar-ISE-GAT
  • 8. Conti.. • Problem 2: Consider a boy who has a volleyball tournament on the next day, but today he feels sick. It is unusual that there is only a 40% chance he would fall sick since he is a healthy boy. Now, find the probability of the boy participating in the tournament. The boy is very much interested in volley ball, so there is a 90% probability that he would participate in tournaments and 20% that he will fall sick given that he participates in the tournament. • Solution: P (Boy participating in the tournament) = 90% P (He is sick | Boy participating in the tournament) = 20% P (He is Sick) = 40% The probability of the boy participating in the tournament given that he is sick is: P (Boy participating in the tournament | He is sick) = P (Boy participating in the tournament) × P (He is sick | Boy participating in the tournament)/P (He is Sick) P (Boy participating in the tournament | He is sick) = (0.9 × 0.2)/0.4 = 0.45 Hence, 45% is the probability that the boy will participate in the tournament given that he is sick. 27-06-2025 8 Dr. Shivashankar-ISE-GAT
  • 9. NAÏVE BAYES ALGORITHM • It is a supervised binary class or multi class classification algorithm that works on the principle of Bayes theorem. • It's considered "naive" because it makes a strong, often unrealistic, assumption: that features are conditionally independent given the class label. • This means that the presence or absence of one feature doesn't affect the presence or absence of other features, which simplifies the calculations. • It particularly works for a large dataset and is very fast. It is one of the most effective and simple classification algorithms. • This algorithm considers all features to be independent of each other even though they are individually dependent on the classified object. • Each of the features contributes a probability value independently during classification and hence this algorithm is called as Naïve algorithm. • Some important applications of these algorithms are text classification, recommendation system and face recognition. Algorithm: Naïve Bayes 1. Compute the prior probability for the target class. 2. Compute Frequency matrix and likelihood Probability for each of the feature. 3. Use Bayes theorem Eq. (8.1) to calculate the probability of all hypotheses. 4. Use Maximum A Posteriori (MAP) Hypothesis, ℎ𝑀𝐴𝑃 to classify the test object 27-06-2025 9 Dr. Shivashankar-ISE-GAT
  • 10. Conti.. Example 8.2: Assess a student’s performance using Naïve Bayes algorithm with the dataset provided in Table 8.1. Predict whether a student gets a job offer or not in his final year of the course. Table 8.1: Training Dataset . 27-06-2025 10 Dr. Shivashankar-ISE-GAT Sl. No. CGPA interactiveness Practical knowledge Communication skill Job offer 1 ≥ 9 Yes Very good Good Yes 2 ≥ 8 No Good Moderate Yes 3 ≥ 9 No Average Poor No 4 ≥ 8 No Average Good No 5 ≥ 8 Yes Good Moderate Yes 6 ≥ 9 Yes Good Moderate Yes 7 ≥ 8 Yes Good Poor No 8 ≥ 9 No Very good Good Yes 9 ≥ 8 Yes Good Good Yes 10 ≥ 8 Yes Average Good Yes
  • 11. Conti.. Solution: Step 1: Compute the prior probability for the target feature ‘Job Offer’. The target feature ‘Job Offer’ has two classes, ‘Yes’ and ‘No’. It is a binary classification problem. Given a student instance, we need to classify whether ‘Job Offer = Yes’ or ‘Job Offer = No’. From the training dataset, we observe that the frequency or the number of instances with ‘Job Offer = Yes’ is 7 and ‘Job Offer = No’ is 3. The prior probability for the target feature is calculated by dividing the number of instances belonging to a particular target class by the total number of instances. Hence, the prior probability for ‘Job Offer = Yes’ is 7/10 and ‘Job Offer = No’ is 3/10 as shown in Table 8.2. Table 8.2: Frequency Matrix and Prior Probability of Job Offer Step 2: Compute Frequency matrix and Likelihood Probability for each of the feature. Step 2(a): Feature – CGPA 27-06-2025 11 Dr. Shivashankar-ISE-GAT Job offer classes No. of instances Probability Yes 7 P(Job offer = Yes) = 7/10 No 3 P(Job offer = No) = 3/10
  • 12. Conti.. Table 8.3 shows the frequency matrix for the feature CGPA. Table 8.4 shows how the likelihood probability is calculated for CGPA using conditional probability. Table 8.4: Likelihood Probability of CGPA. From the Table 8.3 Frequency Matrix of CGPA, number of instances with ‘CGPA ≥9’ and ‘Job Offer = Yes’ is 3. The total number of instances with ‘Job Offer = Yes’ is 7. Hence, P (CGPA ≥9 | Job Offer = Yes) = 3/7. 27-06-2025 12 Dr. Shivashankar-ISE-GAT CGPA Job offer = Yes Job offer = No ≥ 9 3 1 ≥ 8 4 0 <8 0 2 Total 7 3 CGPA P(Job offer = Yes) P(Job offer = No) ≥ 9 P (CGPA ≥9 | Job Offer = Yes) = 3/7 P (CGPA ≥9 | Job Offer = No) = 1/3 ≥ 8 P (CGPA ≥8 | Job Offer = Yes) = 4/7 P (CGPA ≥8 | Job Offer = No) = 0/3 <8 P (CGPA <8 | Job Offer = Yes) = 0/7 P (CGPA <8 | Job Offer = No) = 2/3
  • 13. Conti.. • Step 2(b): Feature – Interactiveness Table 8.5 shows the frequency matrix for the feature Interactiveness. Table 8.5: Frequency Matrix of Interactiveness Table 8.6 shows how the likelihood probability is calculated for Interactiveness using conditional probability. Table 8.6: Likelihood Probability of Interactiveness 27-06-2025 13 Dr. Shivashankar-ISE-GAT Interactiveness Job offer = Yes Job offer = No Yes 5 1 No 2 2 Total 7 3 Interactiveness P(Job offer = Yes) P(Job offer = No) Yes P (Interactiveness = Yes | Job Offer = Yes) = 5/7 P (Interactiveness = Yes | Job Offer = No) = 1/3 No P (Interactiveness = No | Job Offer = Yes) = 2/7 P(Interactiveness = No | Job Offer = No) = 2/3
  • 14. Conti.. Step 2(c): Feature – Practical Knowledge Table 8.7 shows the frequency matrix for the feature Practical Knowledge. Table 8.7: Frequency Matrix of Practical Knowledge Table 8.8: Likelihood Probability of Practical Knowledge 27-06-2025 14 Dr. Shivashankar-ISE-GAT Practical knowledge Job offer = Yes Job offer = No Very good 2 0 Average 1 2 Good 4 1 Total 7 3 Practical knowledge P(Job offer = Yes) P(Job offer = No) Very good P (Practical Knowledge = Very Good | Job Offer = Yes) = 2/7 P (Practical Knowledge = Very Good | Job Offer = No) = 0/3 Average P (Practical Knowledge = Average | Job Offer = Yes) = 1/7 P (Practical Knowledge = Average | Job Offer = No) = 2/3 Good P (Practical Knowledge = Good | Job Offer = Yes) = 4/7 P (Practical Knowledge = Good | Job Offer = No) = 1/3
  • 15. Conti.. Step 2(d): Feature – Communication Skills Table 8.9 shows the frequency matrix for the feature Communication Skills. Table 8.9: Frequency Matrix of Communication Skills Table 8.10: Likelihood Probability of Communication Skills Communication Skills P (Job Offer = Yes) 27-06-2025 15 Dr. Shivashankar-ISE-GAT Communication skill Job offer = Yes Job offer = No Good 4 1 Moderate 3 0 Poor 0 2 Total 7 3 Communication skill P(Job offer = Yes) P(Job offer = No) Good P (Communication Skills = Good | Job Offer = Yes) = 4/7 P (Communication Skills = Good | Job Offer = No) = 1/3 Moderate P (Communication Skills = Moderate | Job Offer = Yes) = 3/7 P (Communication Skills = Moderate | Job Offer = No) = 0/3 Poor P (Communication Skills = Poor | Job Offer = Yes) = 0/7 P (Communication Skills = Poor | Job Offer = No) = 2/3
  • 16. Conti.. Step 3: Use Bayes theorem Eq. P (Hypothesis h | Evidence E) = 𝑃 𝐸𝑣𝑖𝑑𝑒𝑛𝑐𝑒 E|𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 ℎ 𝑃(𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠 ℎ) 𝑃(𝐸𝑣𝑖𝑑𝑒𝑛𝑐𝑒 𝐸) to calculate the probability of all hypotheses. P (Job Offer = Yes | Test data) = (P(CGPA ≥9 | Job Offer = Yes) P (Interactiveness = Yes | Job Offer = Yes) P (Practical knowledge = Average | Job Offer = Yes) P (Communication Skills = Good | Job Offer = Yes) P (Job Offer = Yes)))/(P (Test Data)) Given the test data = (CGPA ≥9, Interactiveness = Yes, Practical knowledge = Average, Communication Skills = Good). Hence, P (Job Offer = Yes | Test data) = (P(CGPA ≥9 |Job Offer = Yes) P (Interactiveness = Yes | Job Offer = Yes) P (Practical knowledge = Average | Job Offer = Yes) P (Communication Skills = Good | Job Offer = Yes) P (Job Offer = Yes) = 3/7 × 5/7 × 1/7 × 4/7 × 7/10 = 0.0175 Similarly, for the other case ‘Job Offer = No’, P (Job Offer = No| Test data) = (P(CGPA ≥9 |Job Offer =No) P (Interactiveness = Yes | Job Offer = No) P (Practical knowledge = Average | Job Offer = No) P (Communication Skills = Good | Job Offer =No) P (Job Offer = No))/(P(Test Data)). P (CGPA ≥9 |Job Offer = No) P (Interactiveness = Yes | Job Offer = No) P (Practical knowledge = Average | Job Offer = No) P (Communication Skills = Good | Job Offer = No) P (Job Offer = No) = 1/3 × 1/3 × 2/3 × 1/3 × 3/10 = 0.0074 Step 4: Use Maximum A Posteriori (MAP) Hypothesis, ℎ𝑀𝐴𝑃 to classify the test object to the hypothesis with the highest probability. Since P (Job Offer = Yes | Test data) has the highest probability value (0.0175>0.0074, the test data is classified as ‘Job Offer = Yes’. 27-06-2025 16 Dr. Shivashankar-ISE-GAT
  • 17. Conti… Zero Probability Error • In the previous problem data set, consider the test data to be (CGPA ≥8, Interactiveness = Yes, Practical knowledge = Average, Communication Skills = Good) When computing the posterior probability, P (Job Offer = Yes | Test data) = (P(CGPA ≥8 |Job Offer = Yes) P (Interactiveness = Yes | Job Offer = Yes) P (Practical knowledge = Average | Job Offer = Yes) P (Communication Skills = Good | Job Offer = Yes) P (Job Offer = Yes)))/((P(Test Data)) P (Job Offer = Yes | Test data) = (P(CGPA ≥8 |Job Offer = Yes) P(Interactiveness = Yes | Job Offer = Yes) P (Practical knowledge = Average | Job Offer = Yes) P (Communication Skills = Good | Job Offer = Yes) P (Job Offer = Yes) = 4/7 × 5/7 × 1/7 × 4/7 × 7/10 = 0.0233 Similarly, for the other case ‘Job Offer = No’, When we compute the probability: P (Job Offer = No| Test data) = (P(CGPA ≥8 |Job Offer = No) P (Interactiveness = Yes | Job Offer = No) P (Practical knowledge = Average | Job Offer = No) P (Communication Skills = Good | Job Offer = No) P (Job Offer = No))/(P(Test Data)) = P (CGPA ≥8 |Job Offer =No) P (Interactiveness = Yes | Job Offer = No) P (Practical knowledge = Average | Job Offer = No) P (Communication Skills = Good | Job Offer =No) P (Job Offer = No) = 0/3 × 1/3 × 2/3 × 1/3 × 3/10 = 0 Since the probability value is zero, the model fails to predict, and this is called as ZeroProbability error. This problem arises because there are no instances in the given Table 8.1 for the attribute value CGPA ≥8 and Job Offer = No and hence the probability value of this case is zero. 27-06-2025 17 Dr. Shivashankar-ISE-GAT
  • 18. Conti.. This zero-probability error can be solved by applying a smoothing technique called Laplace correction which means given 1000 data instances in the training dataset, if there are zero instances for a particular value of a feature we can add 1 instance for each attribute value pair of that feature which will not make much difference for 1000 data instances and the overall probability does not become zero. Table 8.11: Scaled Values to 1000 without Laplace Correction Now, add 1 instance for each CGPA-value pair for ‘Job Offer = No’. Then, P (CGPA ≥9 | Job Offer = No) = 101/303 = 0.333 P (CGPA ≥8 | Job Offer = No) = 1/303 = 0.0033 P (CGPA <8 | Job Offer = No) = 201/303 = 0.6634 With scaled values to 1003 data instances, we get P (Job Offer = Yes | Test data) = (P(CGPA ≥8 |Job Offer = Yes) P (Interactiveness = Yes | Job Offer = Yes) P (Practical knowledge = Average | Job Offer = Yes) P (Communication Skills = Good | Job Offer = Yes) P (Job Offer = Yes) = 400/700 × 500/700 × 100/700 × 400/700 × 700/1003 = 0.02325 27-06-2025 18 Dr. Shivashankar-ISE-GAT CGPA (Job Offer = Yes) P (Job Offer = No) ≥9 P (CGPA ≥9 | Job Offer = Yes) = 300/700 P (CGPA ≥9 | Job Offer = No) = 100/300 ≥8 P (CGPA ≥8 | Job Offer = Yes) = 400/700 P (CGPA ≥8 | Job Offer = No) = 0/300 <8 P (CGPA <8 | Job Offer = Yes) = 0/700 P (CGPA <8 | Job Offer = No) = 200/300
  • 19. Problem 1: Apply the naive Bayes classifier to a concept learning problem, classifying days according to whether someone will play tennis {outlook=sunny, temperature=cool, humidity=high. Wind=strong} 6/27/2025 19 Dr. Shivashankar, ISE, GAT Day Outlook Temperature Humidity Wind Play_Tennis D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rain Mild high Strong No
  • 20. Cont… Problem 2: Estimate the conditional probabilities of each attributes {color, legs, height, smelly} for the species classes {M,H} using the data set given in the table. Using these probabilities estimate the probability values for the new instance {color=green, legs=2, height=tall and smelly=No}. 6/27/2025 20 Dr. Shivashankar, ISE, GAT No Color Legs Height Smelly Species 1 White 3 Short Yes M 2 Green 2 Tall No M 3 Green 3 Short Yes M 4 White 3 Short Yes M 5 Green 2 Short No H 6 White 2 Tall No H 7 White 2 Tall No H 8 White 2 Short Yes H
  • 21. Bayes Optimal Classifier • Bayes optimal classifier is a probabilistic model, which in fact, uses the Bayes theorem to find the most probable classification for a new instance given the training data by combining the predictions of all posterior hypotheses, • This is different from Maximum A Posteriori (MAP) Hypothesis, ℎ𝑀𝐴𝑃 which chooses the maximum probable hypothesis or the most probable hypothesis. • Here, a new instance can be classified to a possible classification value 𝐶𝑖 = 𝑚𝑎𝑥𝐶𝑖 σℎ𝑖𝜖𝐻 𝑃( 𝐶𝑖I ℎ𝑖) P(ℎ𝑖I T) 27-06-2025 21 Dr. Shivashankar-ISE-GAT
  • 22. Conti.. • Example 8.3: Given the hypothesis space with 4 hypothesis ℎ1, ℎ2, ℎ3 and ℎ4. Determine if the patient is diagnosed as COVID positive or COVID negative using Bayes Optimal classifier. Table 8.12: Posterior Probability Values Solution: ℎ𝑀𝐴𝑃 chooses ℎ1 which has the maximum probability value 0.3 as the solution and gives the result that the patient is COVID negative. But Bayes Optimal classifier combines the predictions of ℎ2, ℎ3 and ℎ4 which is 0.4 and gives the result that the patient is COVID positive. σℎ𝑖𝜖𝐻 P(COVID Negative I ℎ𝑖) P(ℎ𝑖I T) = 0.3 X 1 = 0.3 σℎ𝑖𝜖𝐻 P(COVID Negative I ℎ𝑖) P(ℎ𝑖I T) = 0.1 X 1 + 0.2 X 1 + 0.1 X 1 = 0.4 Therefore, 𝑚𝑎𝑥𝑐𝑖[𝐶𝑂𝑉𝐼𝐷 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒,𝐶𝑂𝑉𝐼𝐷 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒] σℎ𝑖𝜖𝐻 𝑃( 𝐶𝑖I ℎ𝑖) P(ℎ𝑖I T) = COVID Positive Thus, this algorithm, diagnoses the new instance to be COVID positive. 27-06-2025 22 Dr. Shivashankar-ISE-GAT P(ℎ𝑖𝐼 𝑇) P(COVID Positive I ℎ𝑖) P(COVID Negative I ℎ𝑖) 0.3 0 1 0.1 1 0 0.2 1 0 0.1 1 0
  • 23. NAÏVE BAYES ALGORITHM FOR CONTINUOUS ATTRIBUTES 1. Gaussian Naive Bayes: • This approach assumes that the probability distribution of each continuous attribute is a Gaussian (normal) distribution. • It calculates the probability of a given feature value belonging to a specific class based on the Gaussian probability density function (PDF). • The mean (μ) and standard deviation (σ) of the Gaussian distribution are estimated from the training data for each class. • The likelihood of a feature value x given a class y is calculated using the Gaussian formula: P(x|y) = (1 / (σ * sqrt(2π))) * 𝑒𝑥𝑝 ( (𝑥− μ)2 (2∗σ)2 Where: P(x|y) is the likelihood of feature value x given class y μ is the mean of the Gaussian distribution for class y σ is the standard deviation of the Gaussian distribution for class y exp is the exponential function π = 3.14159 27-06-2025 23 Dr. Shivashankar-ISE-GAT
  • 24. Conti.. 2. Discretization: • Continuous attributes can be converted into discrete categories by creating intervals or bins. • For example, a temperature value could be classified into categories like "low", "medium", or "high". • Different discretization methods can be used, such as: • Equal-width binning: Dividing the range of the attribute into equal-sized intervals. • Equal-frequency binning: Dividing the range into intervals such that each interval contains the same number of data points. • Quartiles: Dividing the data into four groups based on percentiles (25th, 50th, 75th). • Once discretized, the attributes can be treated as discrete variables in the Naive Bayes algorithm 27-06-2025 24 Dr. Shivashankar-ISE-GAT
  • 25. Conti.. Problem 1: Assess a student’s performance using Naïve Bayes algorithm for the continuous attribute. Predict whether a student gets a job offer or not in his final year of the course. The training dataset T consists of 10 data instances with attributes such as ‘CGPA’ and ‘Interactiveness’ as shown in Table 8.13. The target variable is Job Offer which is classified as Yes or No for a candidate student. Table 8.13: Training Dataset with Continuous Attribute 27-06-2025 25 Dr. Shivashankar-ISE-GAT Sl.N0. CGPA Interactiveness Job offer 1 9.5 Yes Yes 2 8.2 No Yes 3 9.3 No No 4 7.6 No No 5 8.4 Yes Yes 6 9.1 Yes Yes 7 7.5 Yes No 8 9.6 No Yes 9 8.6 Yes Yes 10 8.3 Yes Yes
  • 26. Conti.. • Solution: Step 1: Compute the prior probability for the target feature ‘Job Offer’. • Table 8.14: Prior Probability of Target Class Job Offer Classes No. of Instances • Step 2: Compute Frequency matrix and Likelihood Probability for each of the feature. • Gaussian distribution for continuous feature is calculated using the given formula, P(𝑋𝑖 = 𝑥𝑘 𝐼 𝐶𝑗) = g(𝑥𝑘, 𝜇𝑖𝑗𝜎𝑖𝑗) where, 𝑋𝑖is the 𝑖𝑡ℎ continuous attribute in the given dataset and 𝑥𝑘 is a value of the attribute. 𝐶𝑗 denotes the j th class of the target feature. 𝜇𝑖𝑗 denotes the mean of the values of that continuous attribute 𝑋𝑖 with respect to the class j of the target feature. 𝜎𝑖𝑗 denotes the standard deviation of the values of that continuous attribute 𝑋𝑖 with respect to the class j of the target feature. Hence, the normal distribution formula is given as: P(𝑋𝑖 = 𝑥𝑘 𝐼 𝐶𝑗) = 1 𝜎𝑖𝑗 2𝜋 𝑒 − 𝑥𝑘−𝜇𝑖𝑗 2 2𝜎𝑖𝑗 2 27-06-2025 26 Dr. Shivashankar-ISE-GAT Job offer classes No. of instances Probability value Yes 7 P (Job Offer = Yes) = 7/10 No 3 P (Job Offer = No) = 3/10
  • 27. Conti.. Step 2(a): Consider the feature CGPA To calculate the likelihood probability for this continuous attribute, first compute the mean and standard deviation for CGPA with respect to the target class ‘Job Offer’. Here, 𝑋𝑖 = CGPA 𝐶𝑗 = ‘Job Offer = Yes’ Mean and Standard Deviation for class ‘Job Offer = Yes’ are given as: Mean = µCGPA − YES = (9.5+8.2+8.4+9.1+9.6+8.6+8.3)/7 = 8.814286 σij = σCGPA − YES = 𝑥𝑖−µ 2 𝑁−1 = 0.58146 Mean and Standard Deviation for class ‘Job Offer = No’ are given as: Cj = ‘Job Offer = No’ µij = µCGPA − NO = 8.13333 σij = σCGPA − NO = 1.011599 Once Mean and Standard Deviation are computed, the likelihood probability for any test value using Gaussian distribution formula can be calculated. . 27-06-2025 27 Dr. Shivashankar-ISE-GAT
  • 28. Conti.. Step 2(b): Consider the feature Interactiveness Table 8.15: Frequency Matrix of Interactiveness Table 8.16 shows how the likelihood probability is calculated for Interactiveness using conditional probability. Table 8.16: Likelihood Probability of Interactiveness 27-06-2025 28 Dr. Shivashankar-ISE-GAT Interactiveness Job offer = Yes Job offer = No Yes 5 1 No 2 2 Total 7 3 Interactiveness P(Job offer = Yes) P(Job offer = No) Yes P (Interactiveness = Yes | Job Offer = Yes) = 5/7 P (Interactiveness = Yes | Job Offer = No) = 1/3 No P (Interactiveness = No | Job Offer = Yes) = 2/7 P (Interactiveness = No | Job Offer = No) = 2/3
  • 29. Conti.. Step 3: Use Bayes theorem to calculate the probability of all hypotheses. Consider the test data to be (CGPA = 8.5, Interactiveness = Yes). For the hypothesis ‘Job Offer = Yes’: P (Job Offer = Yes | Test data) = (P(CGPA = 8.5 | Job Offer = Yes) × P (Interactiveness = Yes | Job Offer = Yes) × P (Job Offer = Yes) To compute P (CGPA = 8.5 | Job Offer = Yes) use Gaussian distribution formula: P(𝑋𝑖 = 𝑥𝑘 𝐼 𝐶𝑗) = g(𝑥𝑘, 𝜇𝑖𝑗𝜎𝑖𝑗) P(𝑋𝐶𝐺𝑃𝐴 = 8.5| 𝐶𝐽𝑜𝑏 𝑜𝑓𝑓𝑒𝑟=𝑌𝑒𝑠) = 1 𝜎𝐶𝐺𝑃𝐴−𝑌𝐸𝑆 2𝜋 𝑒 − 8.5−𝜇𝐶𝐺𝑃𝐴−𝑌𝐸𝑆 2 2∗𝜎𝐶𝐺𝑃𝐴−𝑌𝐸𝑆 2 P(CGPA = 8.5 |Job offer = Yes) = g(𝑥𝑘 = 8.5, 𝜇𝑖𝑗 = 8.814, 𝜎𝑖𝑗 = 0.581) = 1 0.581 2𝜋 𝑒 − 8.5−8.814 2 2𝑋0.5812 = 0.594 P (Interactiveness = Yes|Job Offer = Yes) = 5/7 P (Job Offer = Yes) = 7/10 Hence: P (Job Offer = Yes | Test data) = (P(CGPA = 8.5 | Job Offer = Yes) × P (Interactiveness = Yes|Job Offer = Yes) × P (Job Offer = Yes) = 0.594 × 5/7 × 7/10 = 0.297 27-06-2025 29 Dr. Shivashankar-ISE-GAT
  • 30. Conti.. Similarly, for the hypothesis ‘Job Offer = No’: P (Job Offer = No | Test data) = P (CGPA = 8.5 | Job Offer = No) × P (Interactiveness = Yes | Job Offer = No) × P (Job Offer = No) P(CGPA = 8.5 |Job offer = No) = g(𝑥𝑘 = 8.5, 𝜇𝑖𝑗 = 8.133, 𝜎𝑖𝑗 = 1.0116) P(𝑋𝐶𝐺𝑃𝐴 = 8.5| 𝐶𝐽𝑜𝑏 𝑜𝑓𝑓𝑒𝑟=𝑁𝑜) = 1 𝜎𝐶𝐺𝑃𝐴−𝑁𝑜 2𝜋 𝑒 − 8.5−𝜇𝐶𝐺𝑃𝐴−𝑁𝑜 2 2𝜎𝐶𝐺𝑃𝐴−𝑁𝑜2 = 1 1.0116 2𝜋 𝑒 − 8.5−8.133 2 2𝑋1.01162 = 0.369 P (Interactiveness = Yes | Job Offer = No) = 1/3 P (Job Offer = No) = 0.369 Hence, P (Job Offer = No | Test data) = P (CGPA = 8.5 | Job Offer = No) P (Interactiveness = Yes | Job Offer = No) × P (Job Offer = No) = 0.369 × 1/3 × 3/10 = 0.0369 Step 4: Use Maximum A Posteriori (MAP) Hypothesis, ℎ𝑀𝐴𝑃 to classify the test object to the hypothesis with the highest probability. Since P (Job Offer = Yes | Test data) has the highest probability value of 0.297,the test data is classified as ‘Job Offer = Yes’. 27-06-2025 30 Dr. Shivashankar-ISE-GAT
  • 31. Conti.. Problem 2: Take a real-time example of predicting the result of a student using Naïve Bayes algorithm. The training dataset T consists of 8 data instances with attributes such as ‘Assessment’, ‘Assignment’, ‘Project’ and ‘Seminar’ as shown in Table 8.17. The target variable is Result which is classified as Pass or Fail for a candidate student. Given a test data to be (Assessment = Average, Assignment = Yes, Project = No and Seminar = Good), predict the result of the student. Table 8.17: Training Dataset P(Pass | data) ≈ 0.008 P(Fail | data) ≈ 0.00617 • Prediction: Pass 27-06-2025 31 Dr. Shivashankar-ISE-GAT Sl. No. Assessment Assignment Project Seminar Result 1 Good Yes Yes Good Pass 2 Average Yes No Poor Fail 3 Good No Yes Good Pass 4 Average No No Poor Fail 5 Average No Yes Good Pass 6 Good No No Poor Pass 7 Average Yes Yes Good Fail 8 Good Yes Yes Poor Pass
  • 32. Conti.. Problem 3: Take a real-time example of predicting the result of a stolen using Naïve Bayes algorithm. The training dataset T consists of 10data instances with attributes such as ‘color’, ‘type’ and ‘origin’ as shown in Table. The target variable is stolen which is classified as yes or No. Table 4.12: Dataset Example No. Color Type Origin Stolen 1 Red Sports Domestic Yes 2 Red Sports Domestic No 3 Red Sports Domestic Yes 4 Yellow Sports Domestic No 5 Yellow Sports Imported Yes 6 Yellow SUV Imported No 7 Yellow SUV Imported Yes 8 Yellow SUV Domestic No 9 Red SUV Imported No 10 Red Sports Imported Yes 27-06-2025 32 Dr. Shivashankar-ISE-GAT
  • 33. Artificial Neural Networks • A neural network is a type of machine learning algorithm inspired by the human brain. • It's a powerful tool that excels at solving complex problems more difficult for traditional computer algorithms to handle, such as image recognition and natural language processing. • Artificial Neural Networks (ANNs) are computational models inspired by the biological neural networks of the human brain. • They are used in machine learning to analyze data and make predictions by processing information through interconnected nodes, or "neurons". • The human brain constitutes a mass of neurons that are all connected as a network, which is actually a directed graph. • These neurons are the processing units which receive information, process it and then transmit this data to other neurons that allows humans to learn almost any task. • ANN is a learning mechanism that models a human brain to solve any non-linear and complex problem. Each neuron is modelled as a computing unit, or simply called as a node in ANN, that is capable of doing complex calculations. • ANN is a system that consists of many such computing units operating in parallel that can learn from observations. • Some typical applications of ANN in the field of computer science are Natural Language Processing (NLP), pattern recognition, face recognition, speech recognition, character recognition, text processing, stock prediction, computer vision, etc. • ANNs also have been considerably used in other engineering fields such as Chemical industry, Medicine, Robotics, Communications, Banking, and Marketing. 27-06-2025 33 Dr. Shivashankar-ISE-GAT
  • 34. Conti.. • The human nervous system has billions of neurons that are the processing units which make humans to perceive things, to hear, to see and to smell. • It makes us to remember, recognize and correlate things around us. It is a learning system that consists of functional units called nerve cells, typically called as neurons. • The human nervous system is divided into two sections called the Central Nervous System (CNS) and the Peripheral Nervous System (PNS). • The brain and the spinal cord constitute the CNS and the neurons inside and outside the CNS constitute the PNS. The neurons are basically classified into three types called sensory neurons, motor neurons and interneurons. • Sensory neurons get information from different parts of the body and bring it into the CNS, whereas motor neurons receive information from other neurons and transmit commands to the body parts. • The CNS consists of only interneurons which connect one neuron to another neuron by receiving information from one neuron and transmitting it to another. • The basic functionality of a neuron is to receive information, process it and then transmit it to another neuron or to a body part. 27-06-2025 34 Dr. Shivashankar-ISE-GAT
  • 35. Biological Neurons A typical biological neuron has four parts called dendrites, soma, axon and synapse. The body of the neuron is called as soma. • Dendrites accept the input information and process it in the cell body called soma. • A single neuron is connected by axons to around 10,000 neurons and through these axons the processed information is passed from one neuron to another neuron. • A neuron gets fired if the input information crosses a threshold value and transmits signals to another neuron through a synapse. • A synapse gets fired with an electrical impulse called spikes which are transmitted to another neuron. • A single neuron can receive synaptic inputs from one neuron or multiple neurons. • These neurons form a network structure which processes input information and gives out a response. Figure 10.1: A Biological Neuron 27-06-2025 35 Dr. Shivashankar-ISE-GAT
  • 36. Artificial Neurons • Artificial neurons are like biological neurons which are called as nodes. • A node or a neuron can receive one or more input information and process it. • Artificial neurons or nodes are connected by connection links to one another. • Each connection link is associated with a synaptic weight. Figure 10.2: Artificial Neurons 27-06-2025 36 Dr. Shivashankar-ISE-GAT
  • 37. Simple Model of an Artificial Neuron The first mathematical model of a biological neuron was designed by McCulloch & Pitts in 1943. It includes two steps: 1. It receives weighted inputs from other neurons 2. It operates with a threshold function or activation function • The received inputs are computed as a weighted sum which is given to the activation function and if the sum exceeds the threshold value the neuron gets fired. • The neuron is the basic processing unit that receives a set of inputs 𝑥1, 𝑥2, 𝑥3,… 𝑥𝑛and their associated weights 𝑤1, 𝑤2, 𝑤3,…. 𝑤𝑛. • The Summation function ‘Net-sum’ Eq. (10.1) computes the weighted sum of the inputs received by the neuron. 𝑁𝑒𝑡𝑆𝑢𝑚 = ෍ 𝑖=1 𝑛 𝑥𝑖𝑤𝑖 The activation function is a binary step function which outputs a value 1 if the Net-sum is above the threshold value q, and a 0 if the Net-sum is below the threshold value q. Therefore, the activation function is applied to Net-sum as shown in below equation. f(x) = Activation function (Net - sum) Then, output of a neuron 𝛾 = ቊ 1 𝑖𝑓 𝑓 𝑥 ≥ 𝜃 0 𝑖𝑓 𝑓 𝑥 < 𝜃 27-06-2025 37 Dr. Shivashankar-ISE-GAT
  • 38. Artificial Neural Network Structure • Artificial Neural Network (ANN) imitates a human brain which inhibits some intelligence. • It has a network structure represented as a directed graph with a set of neuron nodes and connection links or edges connecting the nodes. • The nodes in the graph are arrayed in a layered manner and can process information in parallel. The network given in the figure has three layers called input layer, hidden layer and output layer. The input layer receives the input information (𝑥1, 𝑥2, 𝑥3,… 𝑥𝑛) and passes it to the nodes in the hidden layer. • The edges connecting the nodes from the input layer to the hidden layer are associated with synaptic weights called as connection weights. • These computing nodes or neurons perform some computations based on the input information (𝑥1, 𝑥2, 𝑥3,… 𝑥𝑛) received and if the weighted sum of the inputs to a neuron is above the threshold or the activation level of the neuron, then the neuron fires. • Each neuron employs an activation function that determines the output of the neuron. Figure 10.4: Artificial Neural Network Structure 27-06-2025 38 Dr. Shivashankar-ISE-GAT
  • 39. Activation Function Activation functions are mathematical functions associated with each neuron in the neural network that map input signals to output signals. It decides whether to fire a neuron or not based on the input signals the neuron receives. These functions normalize the output value of each neuron either between 0 and 1 or between -1 and +1. Below are some of the activation functions used in ANNs: 1. Identity Function or Linear Function f(x) = x ∀x The value of f(x) increases linearly or proportionally with the value of x. This function is useful when we do not want to apply any threshold. The output would be just the weighted sum of input values. The output value ranges between -∞ and +∞. 2. Binary Step Function: f(x)= ቊ 1 𝑖𝑓 𝑓 𝑥 ≥ 𝜃 0 𝑖𝑓 𝑓 𝑥 < 𝜃 The output value is binary, i.e., 0 or 1 based on the threshold value q. If value of f(x) is greater than or equal to q, it outputs 1 or else it outputs 0. 3. Bipolar Step Function: f(x)= ቊ 1 𝑖𝑓 𝑓 𝑥 ≥ 𝜃 0 𝑖𝑓 𝑓 𝑥 < 𝜃 The output value is bipolar, i.e., +1 or -1 based on the threshold value q. If value of f(x) is greater than or equal to q, it outputs +1 or else it outputs -1. . 27-06-2025 39 Dr. Shivashankar-ISE-GAT
  • 40. Conti.. 4. Sigmoidal Function or Logistic Function 𝜎(x) = 1 1+𝑒−𝑥 It is a widely used non-linear activation function which produces an S-shaped curve and the output values are in the range of 0 and 1. 5. Bipolar Sigmoid Function 𝜎(x) = 1−𝑒−𝑥 1+𝑒−𝑥 It outputs values between -1 and +1. 6. Ramp Functions f(x)= ൞ 1 𝑖𝑓 𝑥 > 1 𝑥 𝑖𝑓 0 ≤ 𝑥 ≤ 1𝑓 𝑥 < 𝜃 0 𝑖𝑓 𝑥 < 0 It is a linear function whose upper and lower limits are fixed. 7. Tanh – Hyperbolic Tangent Function The Tanh function is a scaled version of the sigmoid function which is also non-linear. It also suffers from the vanishing gradient problem. The output values range between -1 and 1. tan h(x) = 1 1+𝑒−2𝑥 − 1 27-06-2025 40 Dr. Shivashankar-ISE-GAT
  • 41. Conti.. 8. ReLu – Rectified Linear Unit Function This activation function is a typical function generally used in deep learning neural network models in the hidden layers. It avoids or reduces the vanishing gradient problem. This function outputs a value of 0 for negative input values and works like a linear function if the input values are positive. r(x) = max (0, x) = f(x)= ቊ 𝑥 𝑖𝑓 𝑥 ≥ 𝜃 0 𝑖𝑓 𝑥 < 𝜃 9. Softmax Function This is a non-linear function used in the output layer that can handle multiple classes. It calculates the probability of each target class which ranges between 0 and 1. The probability of the input belonging to a particular class is computed by dividing the exponential of the given input value by the sum of the exponential values of all the inputs. 𝑆 𝑥𝑖 = 𝑒𝑥𝑖 σ𝑗=0 𝑘 𝑒𝑥𝑗 27-06-2025 41 Dr. Shivashankar-ISE-GAT
  • 42. PERCEPTRON AND LEARNING THEORY • A perceptron is a fundamental unit in neural networks, essentially a model of a biological neuron. • It's a binary classifier that takes multiple inputs, applies weights and a bias, and then uses an activation function to produce a single output, typically 0 or 1. • The perceptron algorithm learns by adjusting the weights to minimize the error between its prediction and the desired output. The perceptron model consists of 4 steps: 1. Inputs from other neurons 2. Weights and bias 3. Net sum 4. Activation function The summation function ‘Net-sum’ Eq. computes the weighted sum of the inputs received by the neuron. 𝑁𝑒𝑡𝑆𝑢𝑚 = σ𝑖=1 𝑛 𝑥𝑖𝑤𝑖 27-06-2025 42 Dr. Shivashankar-ISE-GAT
  • 43. Conti.. After computing the ‘Net-sum’, bias value is added to it and inserted in the activation function as shown below: f(x) = Activation function (Net-sum + bias) The activation function is a binary step function which outputs a value 1 if f(x) is above the threshold value q, and a 0 if f(x) is below the threshold value q. Then, output of a neuron: f(x)= ቊ 1 𝑖𝑓 𝑓 𝑥 ≥ 𝜃 0 𝑖𝑓 𝑓 𝑥 < 𝜃 Set initial weights 𝑤1, 𝑤2, 𝑤3, … . . 𝑤𝑛 and bias 𝜃 to a random value in the range [-0.5, 0.5]. For each Epoch, 1. Compute the weighted sum by multiplying the inputs with the weights and add the products. 2. Apply the activation function on the weighted sum: Y = Step ((𝑥1𝑤1 + 𝑥2𝑤2) – q) 3. If the sum is above the threshold value, output the value as positive else output the value as negative. 4. Calculate the error by subtracting the estimated output 𝑌𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 from the desired output 𝑌𝑑𝑒𝑠𝑖𝑔𝑒𝑟𝑒𝑑: Error e(t) = 𝑌𝑑𝑒𝑠𝑖𝑔𝑒𝑟𝑒𝑑 - 𝑌𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 5. Update the weights if there is an error: ∆𝑤𝑖 = 𝛼 𝑋 𝑒 𝑡 𝑋 𝑥𝑖, where, 𝑥𝑖 is the input value, e(t) is the error at step t, 𝛼 is the learning rate and ∆𝑤𝑖 is the difference in weight that has to be added to 𝑤𝑖. 27-06-2025 43 Dr. Shivashankar-ISE-GAT
  • 44. Conti.. • Problem 1: Consider a perceptron to represent the Boolean function AND with the initial weights 𝑤1 = 0.3, 𝑤2 = -0.2, learning rate ∝ = 0.2 and bias 𝜃 = 0.4 as shown in Figure 4.5. The activation function used here is the Step function f(x) which gives the output value as binary, i.e., 0 or 1. If value of f(x) is greater than or equal to 0, it outputs 1 or else it outputs 0. Design a perceptron that performs the Boolean function AND and update the weights until the Boolean function gives the desired output. Figure 4.5: Perceptron for Boolean function AND. 27-06-2025 44 Dr. Shivashankar-ISE-GAT
  • 45. Conti.. • Solution: Desired output for Boolean function AND is shown in Table 4.1. Table 4.1: AND Truth Table Table 4.2: Epoch 1 27-06-2025 45 Dr. Shivashankar-ISE-GAT 𝑥1 𝑥2 𝑌𝑑𝑒𝑠 0 0 0 0 1 0 1 0 0 1 1 1 Epoch 𝑥1 𝑥2 𝑌𝑑𝑒𝑠 𝑌𝑒𝑠𝑡 Error 𝑤1 𝑤2 Status 1 0 0 0 Step ((0 × 0.3 + 0 × -0.2) – 0.4) = 0 0 0.3 -0.2 No change 0 1 0 Step ((0 × 0.3 + 1 × -0.2) – 0.4) = 0 0 0.3 -0.2 No change 1 0 0 Step ((1 × 0.3 + 0 × -0.2) – 0.4) = 0 0 0.3 -0.2 No change 1 1 1 Step ((1 × 0.3 + 1 × -0.2) – 0.4) = 0 1 0.5 0 Change
  • 46. Conti.. For input (1, 1) the weights are updated as follows: ∆𝑤𝑖 = 𝛼 𝑋 𝑒 𝑡 𝑋 𝑥1 = 0.2 × 1 × 1 = 0.2 𝑤1 = 𝑤1 +∆𝑤1= 0.3 + 0.2 = 0.5 ∆𝑤2 = 𝛼 𝑋 𝑒 𝑡 𝑋 𝑥2 = 0.2 × 1 × 1 = 0.2 𝑤2 = 𝑤2 +∆𝑤2= -0.2 + 0.2 = 0 Table 4.3: Epoch 2 27-06-2025 46 Dr. Shivashankar-ISE-GAT Epoch 𝑥1 𝑥2 𝑌𝑑𝑒𝑠 𝑌𝑒𝑠𝑡 Error 𝑤1 𝑤2 Status 2 0 0 0 Step ((0 × 0.5 + 0 × 0) – 0.4) = 0 0 0.5 0 No change 0 1 0 Step ((0 × 0.5 + 1 × 0) – 0.4) = 0 0 0.5 0 No change 1 0 0 Step ((1 × 0.5 + 0 × 0) – 0.4) = 1 -1 0.3 0 change 1 1 1 Step ((1 × 0.3 + 1 × 0) – 0.4) = 0 1 0.5 0.2 Change
  • 47. Conti.. For input (1, 0) the weights are updated as follows: ∆𝑤1 = 𝛼 𝑋 𝑒 𝑡 𝑋 𝑥1 = 0.2 × -1 × 1 = -0.2 𝑤1 = 𝑤1 +∆𝑤1= 0.5 - 0.2 = 0.3 ∆𝑤2 = 𝛼 𝑋 𝑒 𝑡 𝑋 𝑥2 = 0.2 × -1 × 0 = 0 𝑤2 = 𝑤2 +∆𝑤2= 0 + 0 = 0 For input (1, 1), the weights are updated as follows: ∆𝑤𝑖 = 𝛼 𝑋 𝑒 𝑡 𝑋 𝑥1 = 0.2 × 1 × 1 = 0.2 𝑤1 = 𝑤1 +∆𝑤1= 0.3 + 0.2 = 0.5 ∆𝑤2 = 𝛼 𝑋 𝑒 𝑡 𝑋 𝑥2 = 0.2 × 1 × 1 = 0.2 𝑤2 = 𝑤2 +∆𝑤2= 0 + 0.2 = 0.2 Table 4.4: Epoch 3 27-06-2025 47 Dr. Shivashankar-ISE-GAT Epoch 𝑥1 𝑥2 𝑌𝑑𝑒𝑠 𝑌𝑒𝑠𝑡 Error 𝑤1 𝑤2 Status 3 0 0 0 Step ((0 × 0.5 + 0 × 0.2) – 0.4) = 0 0 0.5 0.2 No change 0 1 0 Step ((0 × 0.5 + 1 × 0.2) – 0.4) = 0 0 0.5 0.2 No change 1 0 0 Step ((1 × 0.5 + 0 × 0.2) – 0.4) = 1 -1 0.3 0.2 change 1 1 1 Step ((1 × 0.3 + 1 × 0.2) – 0.4) = 1 0 0.3 0.2 No change
  • 48. Conti.. For input (1, 0) the weights are updated as follows: ∆𝑤1 = 𝛼 𝑋 𝑒 𝑡 𝑋 𝑥1 = 0.2 × -1 × 1 = -0.2 𝑤1 = 𝑤1 +∆𝑤1= 0.5 - 0.2 = 0.3 ∆𝑤2 = 𝛼 𝑋 𝑒 𝑡 𝑋 𝑥2 = 0.2 × -1 × 0 = 0 𝑤2 = 𝑤2 +∆𝑤2= 0.2 + 0 = 0.2 Table 10.5: Epoch 4 It is observed that with 4 Epochs, the perceptron learns and the weights are updated to 0.3 and 0.2 with which the perceptron gives the desired output of a Boolean AND function. 27-06-2025 48 Dr. Shivashankar-ISE-GAT Epoch 𝑥1 𝑥2 𝑌𝑑𝑒𝑠 𝑌𝑒𝑠𝑡 Error 𝑤1 𝑤2 Status 4 0 0 0 Step ((0 × 0.3 + 0 × 0.2) – 0.4) = 0 0 0.3 0.2 No change 0 1 0 Step ((0 × 0.3 + 1 × 0.2) – 0.4) = 0 0 0.3 0.2 No change 1 0 0 Step ((1 × 0.3 + 0 × 0.2) – 0.4) = 0 0 0.3 0.2 No change 1 1 1 Step ((1 × 0.3 + 1 × 0.2) – 0.4) = 1 0 0.3 0.2 No change
  • 49. Problem Problem 1: Assume 𝑤1 = 0.6 𝑎𝑛𝑑 𝑤2 = 0.6, 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 = 1 𝑎𝑛𝑑 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔 𝑟𝑎𝑡𝑒 ƞ=0.5. Compute OR gate using perceptron training rule. Solution : 1. A=0, B=0 and target=0 𝑤𝑖𝑥𝑖 = 𝑤1𝑥1+𝑤2𝑥2 =0.6*0+0.6*0=0 This is not greater than the threshold value of 1. So the output =0 2. A=0, B=1 and target =1 𝑤𝑖𝑥𝑖 = 0.6 ∗ 0 +0.6*1= 0.6 This is not greater than the threshold value of 1. So the output =0. 𝑤𝑖=𝑤𝑖+ƞ(t-o) 𝑥𝑖 𝑤1=0.6+0.5(1-0)0=0.6 𝑤2=0.6+0.5(1-0)1=1.1 Now 𝒘𝟏=0.6, 𝒘𝟐=1.1, threshold = 1 and learning rate ƞ=0.5 6/27/2025 49 Dr. Shivashankar, ISE, GAT A B Y=A+B (Target) 0 0 0 0 1 1 1 0 1 1 1 1
  • 50. Problem • Problem 4: Consider NAND gate, compute Perceptron training rule with W1=1.2, W2=0.6 threshold =-1 and learning rate=1.5. • Solution: 6/27/2025 50 Dr. Shivashankar, ISE, GAT A B Y=𝐴. 𝐵 0 0 1 0 1 1 1 0 1 1 1 0
  • 51. Delta Learning Rule and Gradient Descent • Generally, learning in neural networks is performed by adjusting the network weights in order to minimize the difference between the desired and estimated outputs. • This delta difference is measured as an error function or also called as cost function. • The cost function, being linear and continuous, is differentiable. • This way of learning called as delta rule (also known as Widrow-Hoff” rule or Adaline rule) is a type of back propagation applied for training the network. • The training error of a hypothesis is half the squared difference between the desired target output and actual output and is given as follows: Training Error = 1 2 σ𝑑𝑒𝑇 𝑂𝑑𝑒𝑠𝑖𝑟𝑒𝑑 − 𝑂𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 2 where, T is the training dataset, 𝑂𝑑𝑒𝑠𝑖𝑟𝑒𝑑 and 𝑂𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒𝑑 are the desired target output and estimated actual output, respectively, for a training instance d. The principle of gradient descent is an optimization. • Gradient descent learning is the foundation of back propagation algorithm used in MLP. • Before we study about an MLP, let us first understand the different types of neural networks that differ in their structure, activation function and learning mechanism. 27-06-2025 51 Dr. Shivashankar-ISE-GAT
  • 52. TYPES OF ARTIFICIAL NEURAL NETWORKS • ANNs consist of multiple neurons arranged in layers.There are different types of ANNs that differ by the network structure, activation function involved and the learning rules used. • In an ANN, there are three layers called input layer, hidden layer and output layer. • Any general ANN would consist of one input layer, one output layer and zero or more hidden layers. 1. Feed Forward Neural Network • This is the simplest neural network that consists of neurons which are arranged in layers and the information is propagated only in the forward direction. • This model may or may not contain a hidden layer and there is no back propagation. • Based on the number of hidden layers they are further classified into single-layered and multi-layered feed forward networks. • These ANNs are simple to design and easy to maintain. • They are fast but cannot be used for complex learning. • They are used for simple classification and simple image processing, etc. Figure 10.7: Model of a Feed Forward Neural Network 27-06-2025 52 Dr. Shivashankar-ISE-GAT
  • 53. Conti.. 2 Fully Connected Neural Network • A fully connected neural network, also known as a dense or feedforward neural network, is an artificial neural network where each neuron in one layer is connected to every neuron in the subsequent layer. • Information flows unidirectionally, from input to output, without any loops or feedback connections. Figure 8: Model of a Fully Connected Neural Network 27-06-2025 53 Dr. Shivashankar-ISE-GAT
  • 54. Conti.. 3. Multi-Layer Perceptron (MLP) • This ANN consists of multiple layers with one input layer, one output layer and one or more hidden layers. • Every neuron in a layer is connected to all neurons in the next layer and thus they are fully connected. • The information flows in both the directions. In the forward direction, the inputs are multiplied by weights of neurons and forwarded to the activation function of the neuron and output is passed to the next layer. • If the output is incorrect, then in the backward direction, error is back propagated to adjust the weights and biases to get correct output. • Thus, the network learns with the training data. • This type of ANN is used in deep learning for complex • classification, speech recognition, medical diagnosis, forecasting, etc. Figure 10.9: Model of a Multi-Layer Perceptron 27-06-2025 54 Dr. Shivashankar-ISE-GAT
  • 55. Feedback Neural Network • Feedback neural networks have feedback connections between neurons that allow information flow in both directions in the network. • The output signals can be sent back to the neurons in the same layer or to the neurons in the preceding layers. • Hence, this network is more dynamic during training. • It allows the network to learn from its previous outputs and adapt to dynamic environments. • This iterative process, where outputs are reused as inputs, enables the network to refine its performance over time and improve its ability to handle complex tasks or changing data. Figure 10: Model of a Feedback Neural Network 27-06-2025 55 Dr. Shivashankar-ISE-GAT
  • 56. POPULAR APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS ANN learning mechanisms are used in many complex applications that involve modelling of non- linear processes. ANN is a useful model that can handle even noisy and incomplete data. They are used to model complex patterns, recognize patterns and solve prediction problems like humans in many areas such as: 1. Real-time applications: Face recognition, emotion detection, self-driving cars, navigation systems, routing systems, target tracking, vehicle scheduling, etc. 2. Business applications: Stock trading, sales forecasting, customer behaviour modelling, Market research and analysis, etc. 3. Banking and Finance: Credit and loan forecasting, fraud and risk evaluation, currency price prediction, real-estate appraisal, etc. 4. Education: Adaptive learning software, student performance modelling, etc. 5. Healthcare: Medical diagnosis or mapping symptoms to a medical case, image interpretation and pattern recognition, drug discovery, etc. 6. Other Engineering Applications: Robotics, aerospace, electronics, manufacturing, communications, chemical analysis, food research, etc. 27-06-2025 56 Dr. Shivashankar-ISE-GAT
  • 57. ADVANTAGES AND DISADVANTAGES OF ANN Advantages of ANN 1. ANN can solve complex problems involving non-linear processes. 2. ANNs can learn and recognize complex patterns and solve problems as humans solve a problem. 3. ANNs have a parallel processing capability and can predict in less time. 4. They have an ability to work with inadequate knowledge. It can even handle incomplete and noisy data. They can scale well to larger data sets and outperforms other learning mechanisms. Limitations of ANN 1. An ANN requires processors with parallel processing capability to train the network running for many epochs. The function of each node requires a CPU capability which is difficult for very large networks with a large amount of data. 2. They work like a ‘black box’ and it is exceedingly difficult to understand their working in inner layers. Moreover, it is hard to understand the relationship between the representations learned at each layer. 27-06-2025 57 Dr. Shivashankar-ISE-GAT
  • 58. CHALLENGES OF ARTIFICIAL NEURAL NETWORKS The major challenges while modelling a real-time application with ANNs are: 1. Training a neural network is the most challenging part of using this technique. Overfitting or underfitting issues may arise if datasets used for training are not correct. It is also hard to generalize to the real-world data when trained with some simulated data. Moreover, neural network models normally need a lot of training data to be robust and are usable for a real- time application. 2. Finding the weight and bias parameters 27-06-2025 58 Dr. Shivashankar-ISE-GAT