SlideShare a Scribd company logo
Bayesian Learning
A sample learning task: Classification
• The system is given a set of instances to learn from
• The system builds/selects a model/hypothesis based
on the given instances
• Based on the learned model the system can classify
unseen instances
Reasoning with probabilities
)(
)|()(
)|(
DP
hDPhP
DhP 
Bayes Rule
evidence
likelihoodprior
posterior


Selecting the best hypothesis
)|()(maxarg hDPhPh
Hh
MAP


)(
)|()(
maxarg
DP
hDPhP
h
Hh
MAP


P(D) can be dropped because it's the same for every h
MAP = Maximum A Posteriori
Probability of Poliposis
008.0)( poliposisP
98.0)|(  poliposisP
0078.0008.098.0)|()(  poliposisPpoliposisP
?)|( poliposisP
Given a medical test is positive what's the probability
we actually have polisposis disease?
prior
likelihood
posterior
Probability of Poliposis
992.0)( poliposisP
03.0)|(  poliposisP
0298.0992.003.0)|()(  poliposisPpoliposisP
?)|( poliposisP
prior
likelihood
posterior
Given a medical test is positive what's the probability
we DON'T have polisposis disease?
Probability of Poliposis
0078.0)|()(  poliposisPpoliposisP
0298.0)|()(  poliposisPpoliposisP
21.0)|( poliposisP
By normalizing we obtain the above probabilities.
It's approximately 4 times more likely we DON'T have polisposis
By comparing the posteriors we find the map hypothesis
79.0)|( poliposisP
Learning from examples
Day Outlook Temp Humidity Wind Play Hacky Sack
1 Sunny Hot High Weak Yes
2 Sunny Hot High Strong No
3 Overcast Hot High Strong Yes
4 Rain Mild High Strong no
5 Rain Cool Normal Strong No
6 Rain Cool Normal Strong No
Naive Bayes classifier
)|()(maxarg hDPhPh
Hh
MAP




i
kik
Kk
NB CxPCPC )|()(maxarg
Called naive because it assumes all attributes are independent
Implicit model
Day Outlook Temp Humidity Wind Class
1 Sunny = 1/2
Hot = 2/2 High = 2/2
Weak = 1/2
Yes= 2/6
3 Overcast= 1/2 Strong = 1/2
2 Sunny = 1/4 Hot = 1/4
High = 2/4
Strong = 4/4 No = 4/6
4
Rain = 3/4
Mild = 1/4
5
Cool = 2/4 Normal = 2/4
6
Classifying an example
333.06/2)(  yesackPlayHackySP
Day Outlook Temp Humidity Wind Play Hacky Sack?
7 Sunny Hot High Strong ?
666.06/4)(  noackPlayHackySP
5.02/1)|(  yesackPlayHackySsunnyOutlookP
25.04/1)|(  noackPlayHackySsunnyOutlookP
12/2)|(  yesackPlayHackyShotTempP
25.04/1)|(  noackPlayHackyShotTempP
12/2)|(  yesackPlayHackyShighHumidityP
5.04/2)|(  noackPlayHackyShighHumidityP
5.02/1)|(  yesackPlayHackySstrongWindP
14/4)|(  noackPlayHackySstrongWindP
prior
likelihood
Result
08325.05.0115.0333.0)|()|()|()|()( yesstrongPyeshighPyeshotPyessunnyPyesP
8.0! yesackPlayHackyS
0208125.015.025.025.0666.0)|()|()|()|()( nostrongPnohighPnohotPnosunnyPnoP
Calculating the MAP hypothesis for class 'yes' & 'no'
Least square error hypothesis
Where the red lines represents the error
Choosing between two hypotheses
red/yellow lines are the error for h1/h2
Maximum Likelihood Hypothesis
)|(maxarg hDPh
Hh
ML




m
i
i
Hh
ML hdPh
1
)|(maxarg
Assuming all data points are independent
Assuming uniform priors
Least square error hypothesis
2
2
))((
2
1
1
2
2
1
maxarg
ii xhdm
iHh
ML eh


 

using a least square error method has a profound theoretical basis


m
i
ii
Hh
ML xhdh
1
2
))((minarg
assuming the noise is normally distributed
Squared error analysis
x error h1 error h2 square e1 square e2
2 -1.5 2 2.25 4
4 4.5 6 20.25 36
6 -1.5 -2 2.25 4
8 1.5 -1 2.25 1
10 6.5 2 42.25 4
product 9745.39 2304
h2 is more likely then h1 since it has a smaller
squared error product
Overfitting
The model found by the learning algorithm is too complex
Model selection
• Usually tested by splitting the data into training and
testing examples
• Do model selection based on different data sets
• Prefer simple hypothesis over more complex ones
(assign lower priors to complex hypothesis)
Minimum Description Length learning
(MDL)
• Encode both hypothesis and data using an optimal
encoding will output the MAP hypothesis.
)|()(minarg 21
hDLhLh CC
Hh
MDL 

MAPMDL hh 
Compression, Probability, Regularity are all closely related!
References
• Tom M. Mitchell (1997) Machine Learning (pp 154-200)

More Related Content

What's hot (20)

PDF
Bayes Belief Networks
Sai Kumar Kodam
 
PPTX
Probabilistic Reasoning
Junya Tanaka
 
PDF
Neural Networks: Multilayer Perceptron
Mostafa G. M. Mostafa
 
PDF
Network flow problems
Dr Sandeep Kumar Poonia
 
PPTX
Instance based learning
Slideshare
 
PPTX
Bayesian Linear Regression.pptx
JerminJershaTC
 
PPT
SINGLE-SOURCE SHORTEST PATHS
Md. Shafiuzzaman Hira
 
PPTX
Id3 algorithm
SreekuttanJayakumar
 
PDF
Naive Bayes Classifier
Arunabha Saha
 
PPT
2.3 bayesian classification
Krish_ver2
 
PPTX
Belief Networks & Bayesian Classification
Adnan Masood
 
PPT
dynamic programming Rod cutting class
giridaroori
 
PPTX
Naïve Bayes Classifier Algorithm.pptx
Shubham Jaybhaye
 
PDF
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
PPTX
convex hull
ravikirankalal
 
PPTX
Knapsack Problem (DP & GREEDY)
Ridhima Chowdhury
 
PPTX
Module 4 part_1
ShashankN22
 
PPT
Bayes Classification
sathish sak
 
PDF
Linear discriminant analysis
Bangalore
 
PPT
Minimum spanning tree
Hinal Lunagariya
 
Bayes Belief Networks
Sai Kumar Kodam
 
Probabilistic Reasoning
Junya Tanaka
 
Neural Networks: Multilayer Perceptron
Mostafa G. M. Mostafa
 
Network flow problems
Dr Sandeep Kumar Poonia
 
Instance based learning
Slideshare
 
Bayesian Linear Regression.pptx
JerminJershaTC
 
SINGLE-SOURCE SHORTEST PATHS
Md. Shafiuzzaman Hira
 
Id3 algorithm
SreekuttanJayakumar
 
Naive Bayes Classifier
Arunabha Saha
 
2.3 bayesian classification
Krish_ver2
 
Belief Networks & Bayesian Classification
Adnan Masood
 
dynamic programming Rod cutting class
giridaroori
 
Naïve Bayes Classifier Algorithm.pptx
Shubham Jaybhaye
 
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
convex hull
ravikirankalal
 
Knapsack Problem (DP & GREEDY)
Ridhima Chowdhury
 
Module 4 part_1
ShashankN22
 
Bayes Classification
sathish sak
 
Linear discriminant analysis
Bangalore
 
Minimum spanning tree
Hinal Lunagariya
 

Similar to Bayesian learning (20)

PPT
bayesNaive.ppt
KhushiDuttVatsa
 
PPT
bayesNaive.ppt
OmDalvi4
 
PPT
bayesNaive algorithm in machine learning
Kumari Naveen
 
PPT
bayes answer jejisiowwoowwksknejejrjejej
afshashaik368
 
PDF
Naive bayes Naive bayes Naive bayes Naive bayes
DecentMusicians
 
PDF
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
PPT
Naive Bayes Classifier.ppt helping others by sharing the ppt
bundugod
 
PDF
Dwdm naive bayes_ankit_gadgil_027
ankitgadgil
 
PPT
3.Classification.ppt
KhalilDaiyob1
 
PPTX
-BayesianLearning in machine Learning 12
Kumari Naveen
 
PDF
Naive bayes classifier python session
Mostafa El-Hosseini
 
PPT
NaiveBayes_machine-learning(basic_ppt).ppt
artelex12
 
PDF
Bayesian Learning- part of machine learning
kensaleste
 
PPTX
Naive Bayes.pptx
SobanSquad1
 
PDF
Data classification sammer
Sammer Qader
 
PDF
Module - 4 Machine Learning -22ISE62.pdf
Dr. Shivashankar
 
PDF
Dbm630 lecture07
Tokyo Institute of Technology
 
PPT
ch8Bayes.ppt
GurinderSingh494887
 
PDF
Lecture10 - Naïve Bayes
Albert Orriols-Puig
 
PPTX
2.statistical DEcision makig.pptx
ImpanaR2
 
bayesNaive.ppt
KhushiDuttVatsa
 
bayesNaive.ppt
OmDalvi4
 
bayesNaive algorithm in machine learning
Kumari Naveen
 
bayes answer jejisiowwoowwksknejejrjejej
afshashaik368
 
Naive bayes Naive bayes Naive bayes Naive bayes
DecentMusicians
 
Bayesian Learning - Naive Bayes Algorithm
Sharmila Chidaravalli
 
Naive Bayes Classifier.ppt helping others by sharing the ppt
bundugod
 
Dwdm naive bayes_ankit_gadgil_027
ankitgadgil
 
3.Classification.ppt
KhalilDaiyob1
 
-BayesianLearning in machine Learning 12
Kumari Naveen
 
Naive bayes classifier python session
Mostafa El-Hosseini
 
NaiveBayes_machine-learning(basic_ppt).ppt
artelex12
 
Bayesian Learning- part of machine learning
kensaleste
 
Naive Bayes.pptx
SobanSquad1
 
Data classification sammer
Sammer Qader
 
Module - 4 Machine Learning -22ISE62.pdf
Dr. Shivashankar
 
ch8Bayes.ppt
GurinderSingh494887
 
Lecture10 - Naïve Bayes
Albert Orriols-Puig
 
2.statistical DEcision makig.pptx
ImpanaR2
 
Ad

Recently uploaded (20)

PDF
oil and gas chemical injection system
Okeke Livinus
 
PPTX
Systamatic Acquired Resistence (SAR).pptx
giriprasanthmuthuraj
 
PDF
Portable Hyperspectral Imaging (pHI) for the enhanced recording of archaeolog...
crabbn
 
PPTX
Bacillus thuringiensis.crops & golden rice
priyadharshini87125
 
PPTX
Cerebellum_ Parts_Structure_Function.pptx
muralinath2
 
PDF
Carbonate formation and fluctuating habitability on Mars
Sérgio Sacani
 
DOCX
Analytical methods in CleaningValidation.docx
Markus Janssen
 
DOCX
Critical Book Review (CBR) - "Hate Speech: Linguistic Perspectives"
Sahmiral Amri Rajagukguk
 
PDF
Pharmakon of algorithmic alchemy: Marketing in the age of AI
Selcen Ozturkcan
 
PDF
A High-Caliber View of the Bullet Cluster through JWST Strong and Weak Lensin...
Sérgio Sacani
 
PDF
Rapid protoplanet formation in the outer Solar System recorded in a dunite fr...
Sérgio Sacani
 
PDF
Unit-5 ppt.pdf unit 5 organic chemistry 3
visionshukla007
 
PPTX
LESSON 2 PSYCHOSOCIAL DEVELOPMENT.pptx L
JeanCarolColico1
 
PDF
Global Congress on Forensic Science and Research
infoforensicscience2
 
PDF
Annual report 2024 - Inria - English version.pdf
Inria
 
PDF
Calcium in a supernova remnant as a fingerprint of a sub-Chandrasekhar-mass e...
Sérgio Sacani
 
DOCX
Paper - Suprasegmental Features (Makalah Presentasi)
Sahmiral Amri Rajagukguk
 
DOCX
Paper - Taboo Language (Makalah Presentasi)
Sahmiral Amri Rajagukguk
 
PPTX
ION EXCHANGE CHROMATOGRAPHY NEW PPT (JA).pptx
adhagalejotshna
 
PDF
Plant growth promoting bacterial non symbiotic
psuvethapalani
 
oil and gas chemical injection system
Okeke Livinus
 
Systamatic Acquired Resistence (SAR).pptx
giriprasanthmuthuraj
 
Portable Hyperspectral Imaging (pHI) for the enhanced recording of archaeolog...
crabbn
 
Bacillus thuringiensis.crops & golden rice
priyadharshini87125
 
Cerebellum_ Parts_Structure_Function.pptx
muralinath2
 
Carbonate formation and fluctuating habitability on Mars
Sérgio Sacani
 
Analytical methods in CleaningValidation.docx
Markus Janssen
 
Critical Book Review (CBR) - "Hate Speech: Linguistic Perspectives"
Sahmiral Amri Rajagukguk
 
Pharmakon of algorithmic alchemy: Marketing in the age of AI
Selcen Ozturkcan
 
A High-Caliber View of the Bullet Cluster through JWST Strong and Weak Lensin...
Sérgio Sacani
 
Rapid protoplanet formation in the outer Solar System recorded in a dunite fr...
Sérgio Sacani
 
Unit-5 ppt.pdf unit 5 organic chemistry 3
visionshukla007
 
LESSON 2 PSYCHOSOCIAL DEVELOPMENT.pptx L
JeanCarolColico1
 
Global Congress on Forensic Science and Research
infoforensicscience2
 
Annual report 2024 - Inria - English version.pdf
Inria
 
Calcium in a supernova remnant as a fingerprint of a sub-Chandrasekhar-mass e...
Sérgio Sacani
 
Paper - Suprasegmental Features (Makalah Presentasi)
Sahmiral Amri Rajagukguk
 
Paper - Taboo Language (Makalah Presentasi)
Sahmiral Amri Rajagukguk
 
ION EXCHANGE CHROMATOGRAPHY NEW PPT (JA).pptx
adhagalejotshna
 
Plant growth promoting bacterial non symbiotic
psuvethapalani
 
Ad

Bayesian learning

  • 2. A sample learning task: Classification • The system is given a set of instances to learn from • The system builds/selects a model/hypothesis based on the given instances • Based on the learned model the system can classify unseen instances
  • 3. Reasoning with probabilities )( )|()( )|( DP hDPhP DhP  Bayes Rule evidence likelihoodprior posterior  
  • 4. Selecting the best hypothesis )|()(maxarg hDPhPh Hh MAP   )( )|()( maxarg DP hDPhP h Hh MAP   P(D) can be dropped because it's the same for every h MAP = Maximum A Posteriori
  • 5. Probability of Poliposis 008.0)( poliposisP 98.0)|(  poliposisP 0078.0008.098.0)|()(  poliposisPpoliposisP ?)|( poliposisP Given a medical test is positive what's the probability we actually have polisposis disease? prior likelihood posterior
  • 6. Probability of Poliposis 992.0)( poliposisP 03.0)|(  poliposisP 0298.0992.003.0)|()(  poliposisPpoliposisP ?)|( poliposisP prior likelihood posterior Given a medical test is positive what's the probability we DON'T have polisposis disease?
  • 7. Probability of Poliposis 0078.0)|()(  poliposisPpoliposisP 0298.0)|()(  poliposisPpoliposisP 21.0)|( poliposisP By normalizing we obtain the above probabilities. It's approximately 4 times more likely we DON'T have polisposis By comparing the posteriors we find the map hypothesis 79.0)|( poliposisP
  • 8. Learning from examples Day Outlook Temp Humidity Wind Play Hacky Sack 1 Sunny Hot High Weak Yes 2 Sunny Hot High Strong No 3 Overcast Hot High Strong Yes 4 Rain Mild High Strong no 5 Rain Cool Normal Strong No 6 Rain Cool Normal Strong No
  • 9. Naive Bayes classifier )|()(maxarg hDPhPh Hh MAP     i kik Kk NB CxPCPC )|()(maxarg Called naive because it assumes all attributes are independent
  • 10. Implicit model Day Outlook Temp Humidity Wind Class 1 Sunny = 1/2 Hot = 2/2 High = 2/2 Weak = 1/2 Yes= 2/6 3 Overcast= 1/2 Strong = 1/2 2 Sunny = 1/4 Hot = 1/4 High = 2/4 Strong = 4/4 No = 4/6 4 Rain = 3/4 Mild = 1/4 5 Cool = 2/4 Normal = 2/4 6
  • 11. Classifying an example 333.06/2)(  yesackPlayHackySP Day Outlook Temp Humidity Wind Play Hacky Sack? 7 Sunny Hot High Strong ? 666.06/4)(  noackPlayHackySP 5.02/1)|(  yesackPlayHackySsunnyOutlookP 25.04/1)|(  noackPlayHackySsunnyOutlookP 12/2)|(  yesackPlayHackyShotTempP 25.04/1)|(  noackPlayHackyShotTempP 12/2)|(  yesackPlayHackyShighHumidityP 5.04/2)|(  noackPlayHackyShighHumidityP 5.02/1)|(  yesackPlayHackySstrongWindP 14/4)|(  noackPlayHackySstrongWindP prior likelihood
  • 13. Least square error hypothesis Where the red lines represents the error
  • 14. Choosing between two hypotheses red/yellow lines are the error for h1/h2
  • 15. Maximum Likelihood Hypothesis )|(maxarg hDPh Hh ML     m i i Hh ML hdPh 1 )|(maxarg Assuming all data points are independent Assuming uniform priors
  • 16. Least square error hypothesis 2 2 ))(( 2 1 1 2 2 1 maxarg ii xhdm iHh ML eh      using a least square error method has a profound theoretical basis   m i ii Hh ML xhdh 1 2 ))((minarg assuming the noise is normally distributed
  • 17. Squared error analysis x error h1 error h2 square e1 square e2 2 -1.5 2 2.25 4 4 4.5 6 20.25 36 6 -1.5 -2 2.25 4 8 1.5 -1 2.25 1 10 6.5 2 42.25 4 product 9745.39 2304 h2 is more likely then h1 since it has a smaller squared error product
  • 18. Overfitting The model found by the learning algorithm is too complex
  • 19. Model selection • Usually tested by splitting the data into training and testing examples • Do model selection based on different data sets • Prefer simple hypothesis over more complex ones (assign lower priors to complex hypothesis)
  • 20. Minimum Description Length learning (MDL) • Encode both hypothesis and data using an optimal encoding will output the MAP hypothesis. )|()(minarg 21 hDLhLh CC Hh MDL   MAPMDL hh  Compression, Probability, Regularity are all closely related!
  • 21. References • Tom M. Mitchell (1997) Machine Learning (pp 154-200)