SlideShare a Scribd company logo
Bayesian Data Analysis
By.Dr.Sarita Tripathy
Assistant Professor
School of Computer Engineering
KIIT Deemed to be University
Bayesian Data Analysis
• BDA deals with a set of practical methods for making inferences from
the available data.
• These methods use probability models to model the given data and
also predict future values.
• Bayesian analysis is a statistical paradigm that answers research
questions about unknown parameters using probability statements. For
example, what is the probability that the average male height is
between 70 and 80 inches or that the average female height is between
60 and 70 inches?
• Identify the data, define a descriptive model, specify a prior, compute
the posterior distribution, interpret the posterior distribution, and,
check that the model is a reasonable description of the data.
Steps in BDA
1.BDA consists of three steps:
1.Setting up the prior distribution:
• The domain expertise and prior knowledge is used to develop join probability
distribution
• for all parameters of data under consideration and also output data(which needs to
be predicted) this is termed as prior distribution.
2.Setting up the posterior distribution:
• After taking into account the observed data(given dataset)calculating and
interpreting the appropriate posterior distribution is done.This is estimating the
conditional probability distribution of the data.
3.Evaluating the fit of the model
BAYESIAN INFERENCE
INTRODUCTION
• Bayesian inference is a method of statistical inference in which
Bayes' theorem is used to update the probability for a hypothesis as
more evidence or information becomes available.
• Bayesian inference is an important technique in statistics, and
especially in mathematical statistics.
• Bayesian inference has found application in a wide range of
activities, including science, engineering, philosophy, medicine,
sport, and law.
• In the philosophy of decision theory, Bayesian inference isclosely
related to subjective probability, often called "Bayesianprobability".
• Bayes theorem adjusts probabilities given new evidence in the
following way:
P(H0|E)= P(E|H0) P(H0)/ P(E)
• Where, H0 represents the hypothesis, called a null hypothesis,
inferred before new evidence
• P(H0) is called the prior probability of H0.
• P(E|H0) is called the conditional probability
• P(E) is called the marginal probability of E: the probability of
witnessing the new evidence.
• P(H0|E) is called the posterior probability of H0 given E.
• The factor P(E|H0) P(E) represents the impact that the
evidence has on the belief in the hypothesis.
• Multiplying the prior probability P(H0) by the factor
P(E|H0)/P(E) will never the yield a probability that is greater
than 1.
• Since P(E) is at least as great as P(E∩H0), which equals to
P(E|H0). P(H0), replacing P(E) with P(E∩H0) in the factor
P(E|H0)/P(E) will yield a posterior probability of 1.
• Therefore, the posterior probability could yield a probability
greater than 1 only if P(E) were less than P(E∩H0) which is
never true.
LIKELIHOOD FUNCTION
• The probability of E given H0, P(E|H0), can be represented as
function of its second argument with its first argument held at
a given value. Such a function is called likelihood function; it
is a function of H0 given E. A ratio of two likelihood functions
is called a likelihood ratio,
˄=L(H0|E)/L(not H0|E)=P(E|H0)/P(E| not H0)
• The marginal probability, P(E), can also be represented as the
sum of the product of all probabilities of mutually exclusive
hypothesis and corresponding conditional probabilities:
P(E| H0) P(H0)+ P(E| not H0) P(not H0)
• As a result, we can rewrite Bayes Theorem as
P(H0|E)=P(E|H0) P(H0)/P(E|H0) P(H0)+ P(E| not H0) P(not
H0)- ˄P(H0)/˄P(H0)| P(not H0)
• With two independent pieces of evidence E1 and E2, Bayesian
inference can be applied iteratively.
• We could use the first piece of evidence to calculate an initial
posterior probability, and then use that posterior probability as
a new prior probability to calculate a second posterior
probability given the second piece of evidence.
• Independence of evidence implies that,
P(E1, E2| H0)= P(E1| H0) * P(E2|H0)
P(E1, E2)= P(E1)* P(E2)
P(E1, E2| not H0)= P(E1| not H0) * P(E2| not H0)
From which bowl is the cookie?
• Suppose there are two full bowls of cookies. Bowl #1 has 10
chocolate chip and 30 plain cookies, while bowl #2 has 20 of each.
Our friend Hardika picks a bowl at random, and then picks a cookie
at random. We may assume there is no reason to believe Hardika
treats one bowl differently from another, likewise for the cookies.
The cookie turns out to be a plain one. How probable is it that
Hardika picked it out of bowl #1?
Bowl#1 Bowl#2
• Intuitively, it seems clear that the answer should be more than
a half, since there are more plain cookies in bowl #1.
• The precise answer is given by Bayes' theorem. Let H1
correspond to bowl #1, and H2 to bowl #2.
• It is given that the bowls are identical from Hardika’s point of
view, thus P(H1) = P(H2) and the two must add up to 1, so
both are equal to 0.5.
• The D is the observation of a plain cookie.
• From the contents of the bowls, we know that
P(D| H1)= 30/40= 0.75 and P(D| H2)= 20/40= 0.5
• Bayes formula then yields,
P(H1| D)= P(H1)*P(D|H1)/ P(H1)*P(D|H1)+ P(H2)*P(D|H2)
= 0.5* 0.75/ 0.5*0.75+0.5*0.5
= 0.6
• Before observing the cookie, the probability that Hardika
chose bowl#1 is the prior probability, P(H1) which is 0.5.
After observing the cookie, we revise the probability as 0.6.
• Its worth noting that our belief that observing the plain cookie
should somewhat affect the prior probability P(H1) has formed
the posterior probability P(H1| D), increased from 0.5 to 0.6
• This reflects our intuition that the cookie is more likely from
the bowl#1, since it has a higher ratio of plain to chocolate
cookies than the other.
PRIOR PROBABILITY DISTRIBUTION
• In Bayesian statistical inference, a prior probability
distribution, often called simply the prior, of an uncertain
quantity p( For e.g. suppose p is the proportion of voters who
will vote for Mr. Narendra Modi in a future election) is the
probability distribution that would express ones uncertainty
about p before the data (For e.g. an election poll) are taken into
account.
• It is meant to attribute uncertainty rather than randomness to
the uncertain quantity.
INTRODUCTION TO NAIVE BAYES
• Suppose your data consist of fruits, described by their color
and shape.
• Bayesian classifiers operate by saying “If you see a fruit that is
red and round, which type of fruit most likely to be, based on
the observed data sample? In future, classify red and round
fruit as that type of fruit.”
• A difficulty arises when you have more than a few variables
and classes- you would require an enormous number of
observations to estimate these probabilities.
• Naïve Bayes classifier assume that the effect of a variable
value on a given class is independent of the values of other
variable.
• This assumption is called class conditional independence.
• It is made to simplify the computation and in this sense
considered to be Naïve.
APPLICATIONS
1. Computer applications
• Bayesian inference has applications in artificial
intelligence and expert systems.
• Bayesian inference techniques have been a fundamental
part of computerized pattern recognition techniques since
the late 1950s.
• Recently Bayesian inference has gained popularity among
the phylogenetics community for these reasons; a number
of applications allow many demographic and evolutionary
parameters to be estimated simultaneously.
Bayesian data analysis1
2. Bioinformatics applications
• Bayesian inference has been applied in different
Bioinformatics applications, including differentially
gene expression analysis, single-cell classification,
cancer subtyping, and etc.
ADVANTAGES
• Including
prediction.
• Including
good information should improve
structure can allow the method to
incorporate more data (for example, hierarchical
modeling allows partial pooling so that external data
can be included in a model even if these external
data share only some characteristics with the current
data being modeled).
DISADVANTAGES
• If the prior information is wrong, it can send
inferences in the wrong direction.
• Bayes inference combines different sources of
information; thus it is no longer an
encapsulation of a particular dataset (which is
sometimes desired, for reasons that go beyond
immediate predictive accuracy and instead
touch on issues of statistical communication).
Naïve Bayesian Classifier
CS 40003: Data Analytics 23
Naïve Bayesian classifier calculate this posterior probability using Bayes’ theorem, which
is as follows.
From Bayes’ theorem on conditional probability, we have
𝑃 𝑌 𝑋 =
𝑃(𝑋|𝑌)∙𝑃(𝑌)
𝑃(𝑋)
=
𝑃(𝑋|𝑌) ∙ 𝑃(𝑌)
𝑃 𝑋 𝑌 = 𝑦1 ∙ 𝑃 𝑌 = 𝑦1 + ⋯ + 𝑃 𝑋 𝑌 = 𝑦𝑘 ∙ 𝑃 𝑌 = 𝑦𝑘
where,
𝑃 𝑋 = σ𝑖=1
𝑘
𝑃(𝑋|𝑌 = 𝑦𝑖) ∙ 𝑃(Y = 𝑦𝑖)
Note:
 𝑃 𝑋 is called the evidence (also the total probability) and it is a constant.
 The probability P(Y|X) (also called class conditional probability) is therefore
proportional to P(X|Y)∙ 𝑃(𝑌).
 Thus, P(Y|X) can be taken as a measure of Y given that X.
P(Y|X) ≈ 𝑃 𝑋 𝑌 ∙ 𝑃(𝑌)
Naïve Bayesian Classifier
CS 40003: Data Analytics 24
Suppose, for a given instance of X (say x = (𝑋1 = 𝑥1) and ….. (𝑋𝑛= 𝑥𝑛)).
There are any two class conditional probabilities namely P(Y= 𝑦𝑖|X=x) and P(Y=
𝑦𝑗 | X=x).
If P(Y= 𝑦𝑖 | X=x) > P(Y= 𝑦𝑗 | X=x), then we say that 𝑦𝑖 is more stronger than 𝑦𝑗 for
the instance X = x.
The strongest 𝑦𝑖 is the classification for the instance X = x.
Naive Bayesian Classifier
25
Example: With reference to the Air Traffic Dataset mentioned earlier, let us tabulate all the posterior
and prior probabilities as shown below.
Class
Attribute On Time Late Very Late Cancelled
Day
Weekday 9/14 = 0.64 ½ = 0.5 3/3 = 1 0/1 = 0
Saturday 2/14 = 0.14 ½ = 0.5 0/3 = 0 1/1 = 1
Sunday 1/14 = 0.07 0/2 = 0 0/3 = 0 0/1 = 0
Holiday 2/14 = 0.14 0/2 = 0 0/3 = 0 0/1 = 0
Season
Spring 4/14 = 0.29 0/2 = 0 0/3 = 0 0/1 = 0
Summer 6/14 = 0.43 0/2 = 0 0/3 = 0 0/1 = 0
Autumn 2/14 = 0.14 0/2 = 0 1/3= 0.33 0/1 = 0
Winter 2/14 = 0.14 2/2 = 1 2/3 = 0.67 0/1 = 0
Naïve Bayesian Classifier
CS 40003: Data Analytics 26
Instance:
Case1: Class = On Time : 0.70 × 0.64 × 0.14 × 0.29 × 0.07 = 0.0013
Case2: Class = Late : 0.10 × 0.50 × 1.0 × 0.50 × 0.50 = 0.0125
Case3: Class = Very Late : 0.15 × 1.0 × 0.67 × 0.33 × 0.67 = 0.0222
Case4: Class = Cancelled : 0.05 × 0.0 × 0.0 × 1.0 × 1.0 = 0.0000
Case3 is the strongest; Hence correct classification is Very Late
Week Day Winter High Heavy ???
Naïve Bayesian Classifier
CS 40003: Data Analytics 27
Note: σ 𝒑𝒊 ≠ 𝟏, because they are not probabilities rather
proportion values (to posterior probabilities)
Input: Given a set of k mutually exclusive and exhaustive classes C =
𝑐1, 𝑐2, … . . , 𝑐𝑘 , which have prior probabilities P(C1), P(C2),….. P(Ck).
There are n-attribute set A = 𝐴1, 𝐴2, … . . , 𝐴𝑛 , which for a given instance have
values 𝐴1 = 𝑎1, 𝐴2 = 𝑎2,….., 𝐴𝑛 = 𝑎𝑛
Step: For each 𝑐𝑖 ∈ C, calculate the class condition probabilities, i = 1,2,…..,k
𝑝𝑖 = 𝑃 𝐶𝑖 × ς𝑗=1
𝑛
𝑃(𝐴𝑗 = 𝑎𝑗|𝐶𝑖)
𝑝𝑥 = max 𝑝1, 𝑝2, … . . , 𝑝𝑘
Output: 𝐶𝑥 is the classification
Algorithm: Naïve Bayesian Classification
Naïve Bayesian Classifier
CS 40003: Data Analytics 28
Pros and Cons
The Naïve Bayes’ approach is a very popular one, which often works well.
However, it has a number of potential problems
It relies on all attributes being categorical.
If the data is less, then it estimates poorly.
Naïve Bayesian Classifier
CS 40003: Data Analytics 29
Approach to overcome the limitations in Naïve Bayesian Classification
Estimating the posterior probabilities for continuous attributes
In real life situation, all attributes are not necessarily be categorical, In fact, there is a mix of both
categorical and continuous attributes.
In the following, we discuss the schemes to deal with continuous attributes in Bayesian classifier.
1. We can discretize each continuous attributes and then replace the continuous
values with its corresponding discrete intervals.
2. We can assume a certain form of probability distribution for the continuous variable and
estimate the parameters of the distribution using the training data. A Gaussian
distribution is usually chosen to represent the posterior probabilities for continuous
attributes. A general form of Gaussian distribution will look like
P x: μ, σ2
=
1
2πσ
e−
x − μ 2
2σ2
where, μ and σ2
denote mean and variance, respectively.
Naïve Bayesian Classifier
CS 40003: Data Analytics 30
For each class Ci, the posterior probabilities for attribute Aj (it is the numeric
attribute) can be calculated following Gaussian normal distribution as follows.
P Aj = aj|Ci =
1
2πσij
e−
aj − μij 2
2σij
2
Here, the parameter μij can be calculated based on the sample mean of attribute
value of Aj for the training records that belong to the class Ci.
Similarly, σij
2 can be estimated from the calculation of variance of such training
records.
THANK YOU.

More Related Content

What's hot (20)

PDF
Spanos: Lecture 1 Notes: Introduction to Probability and Statistical Inference
jemille6
 
PPT
Data analysis
Shameem Ali
 
PDF
A NEW STUDY TO FIND OUT THE BEST COMPUTATIONAL METHOD FOR SOLVING THE NONLINE...
mathsjournal
 
PDF
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
The Statistical and Applied Mathematical Sciences Institute
 
PDF
Testing as estimation: the demise of the Bayes factor
Christian Robert
 
PDF
Big Data Analysis
NBER
 
PDF
An Introduction to Mis-Specification (M-S) Testing
jemille6
 
PDF
Discussion of Persi Diaconis' lecture at ISBA 2016
Christian Robert
 
PPTX
Chap10 hypothesis testing ; additional topics
Judianto Nugroho
 
PDF
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
The Statistical and Applied Mathematical Sciences Institute
 
PPTX
Chap04 discrete random variables and probability distribution
Judianto Nugroho
 
PPT
Introduction to Statistics - Part 2
Damian T. Gordon
 
PDF
A note on estimation of population mean in sample survey using auxiliary info...
Alexander Decker
 
PDF
Day2 statistical tests
Nidhi Gogna Kasetwar
 
PPTX
STATISTICS: Hypothesis Testing
jundumaug1
 
PDF
Transfer Learning for the Detection and Classification of traditional pneumon...
Yusuf Brima
 
PDF
Ev4301897903
IJERA Editor
 
PDF
A Method for Solving Balanced Intuitionistic Fuzzy Assignment Problem
Navodaya Institute of Technology
 
PDF
testing as a mixture estimation problem
Christian Robert
 
PPTX
Test of significance in Statistics
Vikash Keshri
 
Spanos: Lecture 1 Notes: Introduction to Probability and Statistical Inference
jemille6
 
Data analysis
Shameem Ali
 
A NEW STUDY TO FIND OUT THE BEST COMPUTATIONAL METHOD FOR SOLVING THE NONLINE...
mathsjournal
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
The Statistical and Applied Mathematical Sciences Institute
 
Testing as estimation: the demise of the Bayes factor
Christian Robert
 
Big Data Analysis
NBER
 
An Introduction to Mis-Specification (M-S) Testing
jemille6
 
Discussion of Persi Diaconis' lecture at ISBA 2016
Christian Robert
 
Chap10 hypothesis testing ; additional topics
Judianto Nugroho
 
PMED Opening Workshop - Inference on Individualized Treatment Rules from Obse...
The Statistical and Applied Mathematical Sciences Institute
 
Chap04 discrete random variables and probability distribution
Judianto Nugroho
 
Introduction to Statistics - Part 2
Damian T. Gordon
 
A note on estimation of population mean in sample survey using auxiliary info...
Alexander Decker
 
Day2 statistical tests
Nidhi Gogna Kasetwar
 
STATISTICS: Hypothesis Testing
jundumaug1
 
Transfer Learning for the Detection and Classification of traditional pneumon...
Yusuf Brima
 
Ev4301897903
IJERA Editor
 
A Method for Solving Balanced Intuitionistic Fuzzy Assignment Problem
Navodaya Institute of Technology
 
testing as a mixture estimation problem
Christian Robert
 
Test of significance in Statistics
Vikash Keshri
 

Similar to Bayesian data analysis1 (20)

PDF
Bayesian inference
CharthaGaglani
 
PDF
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
PPTX
Lec13_Bayes.pptx
KhushiDuttVatsa
 
PPTX
Introduction to Naive bayes and baysian belief network
Kp Sharma
 
PPT
Bayes Classification
sathish sak
 
PPTX
Bayesian statistics for biologists and ecologists
Masahiro Ryo. Ph.D.
 
PDF
Bayesian Statistics.pdf
MuhammadAnas742878
 
PDF
presentation4.pdf Intro to mcmc methodss
SergeyTsygankov6
 
PDF
06 Machine Learning - Naive Bayes
Andres Mendez-Vazquez
 
PPTX
Bayesian statistics
Alberto Labarga
 
ODP
Introduction to Bayesian Statistics
Philipp Singer
 
PDF
Probably, Definitely, Maybe
James McGivern
 
ODP
NAIVE BAYES CLASSIFIER
Knoldus Inc.
 
PDF
Module - 4 Machine Learning -22ISE62.pdf
Dr. Shivashankar
 
PDF
AI 10 | Naive Bayes Classifier
Mohammad Imam Hossain
 
PPT
UNIT2_NaiveBayes algorithms used in machine learning
michaelaaron25322
 
PPTX
Bayes Rules _ Bayes' theorem _ Bayes.pptx
sattaryehya
 
PPTX
presentation about Bayes Rules with examples.pptx
sattaryehya
 
PPTX
2.statistical DEcision makig.pptx
ImpanaR2
 
PPT
Bayes Theorem - Probability and Statistics
EMALLIKARJUNAREDDY
 
Bayesian inference
CharthaGaglani
 
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
Lec13_Bayes.pptx
KhushiDuttVatsa
 
Introduction to Naive bayes and baysian belief network
Kp Sharma
 
Bayes Classification
sathish sak
 
Bayesian statistics for biologists and ecologists
Masahiro Ryo. Ph.D.
 
Bayesian Statistics.pdf
MuhammadAnas742878
 
presentation4.pdf Intro to mcmc methodss
SergeyTsygankov6
 
06 Machine Learning - Naive Bayes
Andres Mendez-Vazquez
 
Bayesian statistics
Alberto Labarga
 
Introduction to Bayesian Statistics
Philipp Singer
 
Probably, Definitely, Maybe
James McGivern
 
NAIVE BAYES CLASSIFIER
Knoldus Inc.
 
Module - 4 Machine Learning -22ISE62.pdf
Dr. Shivashankar
 
AI 10 | Naive Bayes Classifier
Mohammad Imam Hossain
 
UNIT2_NaiveBayes algorithms used in machine learning
michaelaaron25322
 
Bayes Rules _ Bayes' theorem _ Bayes.pptx
sattaryehya
 
presentation about Bayes Rules with examples.pptx
sattaryehya
 
2.statistical DEcision makig.pptx
ImpanaR2
 
Bayes Theorem - Probability and Statistics
EMALLIKARJUNAREDDY
 
Ad

Recently uploaded (20)

PPTX
Inventory management chapter in automation and robotics.
atisht0104
 
PDF
4 Tier Teamcenter Installation part1.pdf
VnyKumar1
 
PDF
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
PDF
Zero Carbon Building Performance standard
BassemOsman1
 
PDF
Introduction to Ship Engine Room Systems.pdf
Mahmoud Moghtaderi
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PPTX
FUNDAMENTALS OF ELECTRIC VEHICLES UNIT-1
MikkiliSuresh
 
PPTX
IoT_Smart_Agriculture_Presentations.pptx
poojakumari696707
 
PPTX
MULTI LEVEL DATA TRACKING USING COOJA.pptx
dollysharma12ab
 
PPTX
ETP Presentation(1000m3 Small ETP For Power Plant and industry
MD Azharul Islam
 
PPTX
MT Chapter 1.pptx- Magnetic particle testing
ABCAnyBodyCanRelax
 
PDF
Biodegradable Plastics: Innovations and Market Potential (www.kiu.ac.ug)
publication11
 
DOCX
SAR - EEEfdfdsdasdsdasdasdasdasdasdasdasda.docx
Kanimozhi676285
 
PPTX
Chapter_Seven_Construction_Reliability_Elective_III_Msc CM
SubashKumarBhattarai
 
PDF
Air -Powered Car PPT by ER. SHRESTH SUDHIR KOKNE.pdf
SHRESTHKOKNE
 
PDF
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
PDF
勉強会資料_An Image is Worth More Than 16x16 Patches
NABLAS株式会社
 
PPTX
Precedence and Associativity in C prog. language
Mahendra Dheer
 
PPTX
Information Retrieval and Extraction - Module 7
premSankar19
 
PDF
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
Inventory management chapter in automation and robotics.
atisht0104
 
4 Tier Teamcenter Installation part1.pdf
VnyKumar1
 
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
Zero Carbon Building Performance standard
BassemOsman1
 
Introduction to Ship Engine Room Systems.pdf
Mahmoud Moghtaderi
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
FUNDAMENTALS OF ELECTRIC VEHICLES UNIT-1
MikkiliSuresh
 
IoT_Smart_Agriculture_Presentations.pptx
poojakumari696707
 
MULTI LEVEL DATA TRACKING USING COOJA.pptx
dollysharma12ab
 
ETP Presentation(1000m3 Small ETP For Power Plant and industry
MD Azharul Islam
 
MT Chapter 1.pptx- Magnetic particle testing
ABCAnyBodyCanRelax
 
Biodegradable Plastics: Innovations and Market Potential (www.kiu.ac.ug)
publication11
 
SAR - EEEfdfdsdasdsdasdasdasdasdasdasdasda.docx
Kanimozhi676285
 
Chapter_Seven_Construction_Reliability_Elective_III_Msc CM
SubashKumarBhattarai
 
Air -Powered Car PPT by ER. SHRESTH SUDHIR KOKNE.pdf
SHRESTHKOKNE
 
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
勉強会資料_An Image is Worth More Than 16x16 Patches
NABLAS株式会社
 
Precedence and Associativity in C prog. language
Mahendra Dheer
 
Information Retrieval and Extraction - Module 7
premSankar19
 
Machine Learning All topics Covers In This Single Slides
AmritTiwari19
 
Ad

Bayesian data analysis1

  • 1. Bayesian Data Analysis By.Dr.Sarita Tripathy Assistant Professor School of Computer Engineering KIIT Deemed to be University
  • 2. Bayesian Data Analysis • BDA deals with a set of practical methods for making inferences from the available data. • These methods use probability models to model the given data and also predict future values. • Bayesian analysis is a statistical paradigm that answers research questions about unknown parameters using probability statements. For example, what is the probability that the average male height is between 70 and 80 inches or that the average female height is between 60 and 70 inches? • Identify the data, define a descriptive model, specify a prior, compute the posterior distribution, interpret the posterior distribution, and, check that the model is a reasonable description of the data.
  • 3. Steps in BDA 1.BDA consists of three steps: 1.Setting up the prior distribution: • The domain expertise and prior knowledge is used to develop join probability distribution • for all parameters of data under consideration and also output data(which needs to be predicted) this is termed as prior distribution. 2.Setting up the posterior distribution: • After taking into account the observed data(given dataset)calculating and interpreting the appropriate posterior distribution is done.This is estimating the conditional probability distribution of the data. 3.Evaluating the fit of the model
  • 5. INTRODUCTION • Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. • Bayesian inference is an important technique in statistics, and especially in mathematical statistics. • Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, and law. • In the philosophy of decision theory, Bayesian inference isclosely related to subjective probability, often called "Bayesianprobability".
  • 6. • Bayes theorem adjusts probabilities given new evidence in the following way: P(H0|E)= P(E|H0) P(H0)/ P(E) • Where, H0 represents the hypothesis, called a null hypothesis, inferred before new evidence • P(H0) is called the prior probability of H0. • P(E|H0) is called the conditional probability • P(E) is called the marginal probability of E: the probability of witnessing the new evidence. • P(H0|E) is called the posterior probability of H0 given E. • The factor P(E|H0) P(E) represents the impact that the evidence has on the belief in the hypothesis.
  • 7. • Multiplying the prior probability P(H0) by the factor P(E|H0)/P(E) will never the yield a probability that is greater than 1. • Since P(E) is at least as great as P(E∩H0), which equals to P(E|H0). P(H0), replacing P(E) with P(E∩H0) in the factor P(E|H0)/P(E) will yield a posterior probability of 1. • Therefore, the posterior probability could yield a probability greater than 1 only if P(E) were less than P(E∩H0) which is never true.
  • 8. LIKELIHOOD FUNCTION • The probability of E given H0, P(E|H0), can be represented as function of its second argument with its first argument held at a given value. Such a function is called likelihood function; it is a function of H0 given E. A ratio of two likelihood functions is called a likelihood ratio, ˄=L(H0|E)/L(not H0|E)=P(E|H0)/P(E| not H0) • The marginal probability, P(E), can also be represented as the sum of the product of all probabilities of mutually exclusive hypothesis and corresponding conditional probabilities: P(E| H0) P(H0)+ P(E| not H0) P(not H0)
  • 9. • As a result, we can rewrite Bayes Theorem as P(H0|E)=P(E|H0) P(H0)/P(E|H0) P(H0)+ P(E| not H0) P(not H0)- ˄P(H0)/˄P(H0)| P(not H0) • With two independent pieces of evidence E1 and E2, Bayesian inference can be applied iteratively. • We could use the first piece of evidence to calculate an initial posterior probability, and then use that posterior probability as a new prior probability to calculate a second posterior probability given the second piece of evidence.
  • 10. • Independence of evidence implies that, P(E1, E2| H0)= P(E1| H0) * P(E2|H0) P(E1, E2)= P(E1)* P(E2) P(E1, E2| not H0)= P(E1| not H0) * P(E2| not H0)
  • 11. From which bowl is the cookie? • Suppose there are two full bowls of cookies. Bowl #1 has 10 chocolate chip and 30 plain cookies, while bowl #2 has 20 of each. Our friend Hardika picks a bowl at random, and then picks a cookie at random. We may assume there is no reason to believe Hardika treats one bowl differently from another, likewise for the cookies. The cookie turns out to be a plain one. How probable is it that Hardika picked it out of bowl #1? Bowl#1 Bowl#2
  • 12. • Intuitively, it seems clear that the answer should be more than a half, since there are more plain cookies in bowl #1. • The precise answer is given by Bayes' theorem. Let H1 correspond to bowl #1, and H2 to bowl #2. • It is given that the bowls are identical from Hardika’s point of view, thus P(H1) = P(H2) and the two must add up to 1, so both are equal to 0.5. • The D is the observation of a plain cookie. • From the contents of the bowls, we know that P(D| H1)= 30/40= 0.75 and P(D| H2)= 20/40= 0.5
  • 13. • Bayes formula then yields, P(H1| D)= P(H1)*P(D|H1)/ P(H1)*P(D|H1)+ P(H2)*P(D|H2) = 0.5* 0.75/ 0.5*0.75+0.5*0.5 = 0.6 • Before observing the cookie, the probability that Hardika chose bowl#1 is the prior probability, P(H1) which is 0.5. After observing the cookie, we revise the probability as 0.6. • Its worth noting that our belief that observing the plain cookie should somewhat affect the prior probability P(H1) has formed the posterior probability P(H1| D), increased from 0.5 to 0.6
  • 14. • This reflects our intuition that the cookie is more likely from the bowl#1, since it has a higher ratio of plain to chocolate cookies than the other.
  • 15. PRIOR PROBABILITY DISTRIBUTION • In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p( For e.g. suppose p is the proportion of voters who will vote for Mr. Narendra Modi in a future election) is the probability distribution that would express ones uncertainty about p before the data (For e.g. an election poll) are taken into account. • It is meant to attribute uncertainty rather than randomness to the uncertain quantity.
  • 16. INTRODUCTION TO NAIVE BAYES • Suppose your data consist of fruits, described by their color and shape. • Bayesian classifiers operate by saying “If you see a fruit that is red and round, which type of fruit most likely to be, based on the observed data sample? In future, classify red and round fruit as that type of fruit.” • A difficulty arises when you have more than a few variables and classes- you would require an enormous number of observations to estimate these probabilities.
  • 17. • Naïve Bayes classifier assume that the effect of a variable value on a given class is independent of the values of other variable. • This assumption is called class conditional independence. • It is made to simplify the computation and in this sense considered to be Naïve.
  • 18. APPLICATIONS 1. Computer applications • Bayesian inference has applications in artificial intelligence and expert systems. • Bayesian inference techniques have been a fundamental part of computerized pattern recognition techniques since the late 1950s. • Recently Bayesian inference has gained popularity among the phylogenetics community for these reasons; a number of applications allow many demographic and evolutionary parameters to be estimated simultaneously.
  • 20. 2. Bioinformatics applications • Bayesian inference has been applied in different Bioinformatics applications, including differentially gene expression analysis, single-cell classification, cancer subtyping, and etc.
  • 21. ADVANTAGES • Including prediction. • Including good information should improve structure can allow the method to incorporate more data (for example, hierarchical modeling allows partial pooling so that external data can be included in a model even if these external data share only some characteristics with the current data being modeled).
  • 22. DISADVANTAGES • If the prior information is wrong, it can send inferences in the wrong direction. • Bayes inference combines different sources of information; thus it is no longer an encapsulation of a particular dataset (which is sometimes desired, for reasons that go beyond immediate predictive accuracy and instead touch on issues of statistical communication).
  • 23. Naïve Bayesian Classifier CS 40003: Data Analytics 23 Naïve Bayesian classifier calculate this posterior probability using Bayes’ theorem, which is as follows. From Bayes’ theorem on conditional probability, we have 𝑃 𝑌 𝑋 = 𝑃(𝑋|𝑌)∙𝑃(𝑌) 𝑃(𝑋) = 𝑃(𝑋|𝑌) ∙ 𝑃(𝑌) 𝑃 𝑋 𝑌 = 𝑦1 ∙ 𝑃 𝑌 = 𝑦1 + ⋯ + 𝑃 𝑋 𝑌 = 𝑦𝑘 ∙ 𝑃 𝑌 = 𝑦𝑘 where, 𝑃 𝑋 = σ𝑖=1 𝑘 𝑃(𝑋|𝑌 = 𝑦𝑖) ∙ 𝑃(Y = 𝑦𝑖) Note:  𝑃 𝑋 is called the evidence (also the total probability) and it is a constant.  The probability P(Y|X) (also called class conditional probability) is therefore proportional to P(X|Y)∙ 𝑃(𝑌).  Thus, P(Y|X) can be taken as a measure of Y given that X. P(Y|X) ≈ 𝑃 𝑋 𝑌 ∙ 𝑃(𝑌)
  • 24. Naïve Bayesian Classifier CS 40003: Data Analytics 24 Suppose, for a given instance of X (say x = (𝑋1 = 𝑥1) and ….. (𝑋𝑛= 𝑥𝑛)). There are any two class conditional probabilities namely P(Y= 𝑦𝑖|X=x) and P(Y= 𝑦𝑗 | X=x). If P(Y= 𝑦𝑖 | X=x) > P(Y= 𝑦𝑗 | X=x), then we say that 𝑦𝑖 is more stronger than 𝑦𝑗 for the instance X = x. The strongest 𝑦𝑖 is the classification for the instance X = x.
  • 25. Naive Bayesian Classifier 25 Example: With reference to the Air Traffic Dataset mentioned earlier, let us tabulate all the posterior and prior probabilities as shown below. Class Attribute On Time Late Very Late Cancelled Day Weekday 9/14 = 0.64 ½ = 0.5 3/3 = 1 0/1 = 0 Saturday 2/14 = 0.14 ½ = 0.5 0/3 = 0 1/1 = 1 Sunday 1/14 = 0.07 0/2 = 0 0/3 = 0 0/1 = 0 Holiday 2/14 = 0.14 0/2 = 0 0/3 = 0 0/1 = 0 Season Spring 4/14 = 0.29 0/2 = 0 0/3 = 0 0/1 = 0 Summer 6/14 = 0.43 0/2 = 0 0/3 = 0 0/1 = 0 Autumn 2/14 = 0.14 0/2 = 0 1/3= 0.33 0/1 = 0 Winter 2/14 = 0.14 2/2 = 1 2/3 = 0.67 0/1 = 0
  • 26. Naïve Bayesian Classifier CS 40003: Data Analytics 26 Instance: Case1: Class = On Time : 0.70 × 0.64 × 0.14 × 0.29 × 0.07 = 0.0013 Case2: Class = Late : 0.10 × 0.50 × 1.0 × 0.50 × 0.50 = 0.0125 Case3: Class = Very Late : 0.15 × 1.0 × 0.67 × 0.33 × 0.67 = 0.0222 Case4: Class = Cancelled : 0.05 × 0.0 × 0.0 × 1.0 × 1.0 = 0.0000 Case3 is the strongest; Hence correct classification is Very Late Week Day Winter High Heavy ???
  • 27. Naïve Bayesian Classifier CS 40003: Data Analytics 27 Note: σ 𝒑𝒊 ≠ 𝟏, because they are not probabilities rather proportion values (to posterior probabilities) Input: Given a set of k mutually exclusive and exhaustive classes C = 𝑐1, 𝑐2, … . . , 𝑐𝑘 , which have prior probabilities P(C1), P(C2),….. P(Ck). There are n-attribute set A = 𝐴1, 𝐴2, … . . , 𝐴𝑛 , which for a given instance have values 𝐴1 = 𝑎1, 𝐴2 = 𝑎2,….., 𝐴𝑛 = 𝑎𝑛 Step: For each 𝑐𝑖 ∈ C, calculate the class condition probabilities, i = 1,2,…..,k 𝑝𝑖 = 𝑃 𝐶𝑖 × ς𝑗=1 𝑛 𝑃(𝐴𝑗 = 𝑎𝑗|𝐶𝑖) 𝑝𝑥 = max 𝑝1, 𝑝2, … . . , 𝑝𝑘 Output: 𝐶𝑥 is the classification Algorithm: Naïve Bayesian Classification
  • 28. Naïve Bayesian Classifier CS 40003: Data Analytics 28 Pros and Cons The Naïve Bayes’ approach is a very popular one, which often works well. However, it has a number of potential problems It relies on all attributes being categorical. If the data is less, then it estimates poorly.
  • 29. Naïve Bayesian Classifier CS 40003: Data Analytics 29 Approach to overcome the limitations in Naïve Bayesian Classification Estimating the posterior probabilities for continuous attributes In real life situation, all attributes are not necessarily be categorical, In fact, there is a mix of both categorical and continuous attributes. In the following, we discuss the schemes to deal with continuous attributes in Bayesian classifier. 1. We can discretize each continuous attributes and then replace the continuous values with its corresponding discrete intervals. 2. We can assume a certain form of probability distribution for the continuous variable and estimate the parameters of the distribution using the training data. A Gaussian distribution is usually chosen to represent the posterior probabilities for continuous attributes. A general form of Gaussian distribution will look like P x: μ, σ2 = 1 2πσ e− x − μ 2 2σ2 where, μ and σ2 denote mean and variance, respectively.
  • 30. Naïve Bayesian Classifier CS 40003: Data Analytics 30 For each class Ci, the posterior probabilities for attribute Aj (it is the numeric attribute) can be calculated following Gaussian normal distribution as follows. P Aj = aj|Ci = 1 2πσij e− aj − μij 2 2σij 2 Here, the parameter μij can be calculated based on the sample mean of attribute value of Aj for the training records that belong to the class Ci. Similarly, σij 2 can be estimated from the calculation of variance of such training records.