SlideShare a Scribd company logo
2
Most read
4
Most read
8
Most read
MUSA AL-HAWAMDAH
  128129001011
   Bayesian Decision Theory came long before Version
    Spaces, Decision Tree Learning and Neural
    Networks. It was studied in the field of Statistical
    Theory and more specifically, in the field of Pattern
    Recognition.
   Bayesian Decision Theory is at the basis of
    important learning schemes such as the Naïve
    Bayes Classifier, Learning Bayesian Belief Networks
    and the EM Algorithm.
   Bayesian Decision Theory is also useful as it
    provides a framework within which many non-
    Bayesian classifiers can be studied
   Bayesian reasoning is applied to decision
    making and inferential statistics that deals
    with probability inference. It is used the
    knowledge of prior events to predict future
    events.

   Example: Predicting the color of marbles in a
    baske.t
Bayes learning
Bayes learning
   The Bayes Theorem:




   P(h) : Prior probability of hypothesis h
   P(D) : Prior probability of training data D
   P(h/D) : Probability of h given D
   P(D/h) : Probability of D given h
   D : 35 year old customer with an income of $50,000 PA.

   h : Hypothesis that our customer will buy our computer.

   P(h/D) : Probability that customer D will buy our computer given
    that we know his age and income.

   P(h) : Probability that any customer will buy our computer
    regardless of age (Prior Probability).

   P(D/h) : Probability that the customer is 35 yrs old and earns
    $50,000, given that he has bought our computer (Posterior
    Probability).

   P(D) : Probability that a person from our set of customers is 35
    yrs old and earns $50,000.
Example:
   h1: Customer buys a computer = Yes
   h2 : Customer buys a computer = No
       where h1 and h2 are subsets of our
    Hypothesis Space „H‟

   P(h/D) (Final Outcome) = arg max{ P( D/h1) P(h1)
    , P(D/h2) P(h2)}

   P(D) can be ignored as it is the same for both the
    terms
   Theory:

    Generally we want the most probable
    hypothesis given the training data
    hMAP = arg max P(h/D) (where h belongs to
    H and H is the hypothesis space)
Bayes learning
   If we assume P(hi) = P(hj) where the
    calculated probabilities amount to the same
     Further simplification leads to:

 hML = arg max P(D/hi)
(where hi belongs to H)
   P (buys computer = yes) = 5/10 = 0.5
   P (buys computer = no) = 5/10 = 0.5
   P (customer is 35 yrs & earns $50,000) =
    4/10 = 0.4
   P (customer is 35 yrs & earns $50,000 / buys
    computer = yes) = 3/5 =0.6
   P (customer is 35 yrs & earns $50,000 / buys
    computer = no) = 1/5 = 0.2
   Customer buys a computer P(h1/D) = P(h1) *
    P (D/ h1) / P(D) = 0.5 * 0.6 / 0.4

   Customer does not buy a computer P(h2/D) =
    P(h2) * P (D/ h2) / P(D) = 0.5 * 0.2 / 0.4

   Final Outcome = arg max {P(h1/D) , P(h2/D)}
    = max(0.6, 0.2)

=> Customer buys a computer
Naïve Bayesian
 Classification
   It is based on the Bayesian theorem It is
    particularly suited when the dimensionality of
    the inputs is high. Parameter estimation for
    naive Bayes models uses the method of
    maximum likelihood. In spite over-simplified
    assumptions, it often performs better in
    many complex realworl situations.

   Advantage: Requires a small amount of
    training data to estimate the parameters
Bayes learning
   X = ( age= youth, income = medium, student
    = yes, credit_rating = fair)



   A person belonging to tuple X will buy a
    computer?
   Derivation:
   D : Set of tuples
           ** Each Tuple is an „n‟ dimensional
    attribute vector
           ** X : (x1,x2,x3,…. xn)
   Let there be „m‟ Classes : C1,C2,C3…Cm
   Naïve Bayes classifier predicts X belongs to Class
    Ci iff
           **P (Ci/X) > P(Cj/X) for 1<= j <= m , j <> i
   Maximum Posteriori Hypothesis
           **P(Ci/X) = P(X/Ci) P(Ci) / P(X)
           **Maximize P(X/Ci) P(Ci) as P(X) is constant
   With many attributes, it is computationally
    expensive to evaluate P(X/Ci).
   Naïve Assumption of “class conditional
    independence”
   P(C1) = P(buys_computer = yes) = 9/14 =0.643
   P(C2) = P(buys_computer = no) = 5/14= 0.357
   P(age=youth /buys_computer = yes) = 2/9 =0.222
   P(age=youth /buys_computer = no) = 3/5 =0.600
   P(income=medium /buys_computer = yes) = 4/9 =0.444
   P(income=medium /buys_computer = no) = 2/5 =0.400
   P(student=yes /buys_computer = yes) = 6/9 =0.667
   P(student=yes/buys_computer = no) = 1/5 =0.200
   P(credit rating=fair /buys_computer = yes) = 6/9 =0.667
   P(credit rating=fair /buys_computer = no) = 2/5 =0.400
   P(X/Buys a computer = yes) = P(age=youth
    /buys_computer = yes) * P(income=medium
    /buys_computer = yes) * P(student=yes
    /buys_computer = yes) * P(credit rating=fair
    /buys_computer = yes) = 0.222 * 0.444 *
    0.667 * 0.667 = 0.044

    P(X/Buys a computer = No) = 0.600 * 0.400 *
    0.200 * 0.400 = 0.019
   Find class Ci that Maximizes P(X/Ci) * P(Ci)

    =>P(X/Buys a computer = yes) *
    P(buys_computer = yes) = 0.028
    =>P(X/Buys a computer = No) *
    P(buys_computer = no) = 0.007



   Prediction : Buys a computer for Tuple X
   http://en.wikipedia.org/wiki/Bayesian_probability
   http://en.wikipedia.org/wiki/Naive_Bayes_classifier
   http://www.let.rug.nl/~tiedeman/ml05/03_bayesian_hando ut. pdf
   http://www.statsoft.com/textbook/stnaiveb.html
   http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-2/www/mlbook/ch6.pdf
   Chai, K.; H. T. Hn, H. L. Chieu; “Bayesian Online Classifiers for Text Classification
    and Filtering”,
   Proceedings of the 25th annual international ACM SIGIR conference on Research and
    Development in Information Retrieval, August 2002, pp 97-104.
   DATA MINING Concepts and Techniques,Jiawei Han, Micheline Kamber Morgan
    Kaufman Publishers, 2003.
Thank you

More Related Content

PPT
Bayes Classification
sathish sak
 
PPTX
Concept learning and candidate elimination algorithm
swapnac12
 
PPT
Computational Learning Theory
butest
 
PDF
Uncertain knowledge and reasoning
Shiwani Gupta
 
PPTX
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Simplilearn
 
PPTX
Evaluating hypothesis
swapnac12
 
PDF
Markov decision process
Hamed Abdi
 
PPT
Instance Based Learning in Machine Learning
Pavithra Thippanaik
 
Bayes Classification
sathish sak
 
Concept learning and candidate elimination algorithm
swapnac12
 
Computational Learning Theory
butest
 
Uncertain knowledge and reasoning
Shiwani Gupta
 
Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...
Simplilearn
 
Evaluating hypothesis
swapnac12
 
Markov decision process
Hamed Abdi
 
Instance Based Learning in Machine Learning
Pavithra Thippanaik
 

What's hot (20)

PPTX
Naive Bayes Presentation
Md. Enamul Haque Chowdhury
 
PDF
Gradient descent method
Sanghyuk Chun
 
PPTX
Probabilistic Reasoning
Junya Tanaka
 
PPTX
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Edureka!
 
PDF
An introduction to Bayesian Statistics using Python
freshdatabos
 
PDF
Optimization in deep learning
Rakshith Sathish
 
PDF
Recurrent neural networks rnn
Kuppusamy P
 
PDF
What is the Expectation Maximization (EM) Algorithm?
Kazuki Yoshida
 
PDF
Lab report for Prolog program in artificial intelligence.
Alamgir Hossain
 
PPTX
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
UMBC
 
PDF
Domain adaptation: A Theoretical View
Chia-Ching Lin
 
PPTX
Decision Tree Learning
Milind Gokhale
 
PDF
Machine learning Lecture 2
Srinivasan R
 
PDF
Bayesian learning
Vignesh Saravanan
 
PDF
Bayes Belief Networks
Sai Kumar Kodam
 
PDF
Linear regression
MartinHogg9
 
PPTX
Fuzzy c means manual work
Dr.E.N.Sathishkumar
 
PPTX
Machine Learning project presentation
Ramandeep Kaur Bagri
 
PDF
Neural Networks: Radial Bases Functions (RBF)
Mostafa G. M. Mostafa
 
PPTX
Introduction to Artificial Neural Networks
Adri Jovin
 
Naive Bayes Presentation
Md. Enamul Haque Chowdhury
 
Gradient descent method
Sanghyuk Chun
 
Probabilistic Reasoning
Junya Tanaka
 
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Edureka!
 
An introduction to Bayesian Statistics using Python
freshdatabos
 
Optimization in deep learning
Rakshith Sathish
 
Recurrent neural networks rnn
Kuppusamy P
 
What is the Expectation Maximization (EM) Algorithm?
Kazuki Yoshida
 
Lab report for Prolog program in artificial intelligence.
Alamgir Hossain
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
UMBC
 
Domain adaptation: A Theoretical View
Chia-Ching Lin
 
Decision Tree Learning
Milind Gokhale
 
Machine learning Lecture 2
Srinivasan R
 
Bayesian learning
Vignesh Saravanan
 
Bayes Belief Networks
Sai Kumar Kodam
 
Linear regression
MartinHogg9
 
Fuzzy c means manual work
Dr.E.N.Sathishkumar
 
Machine Learning project presentation
Ramandeep Kaur Bagri
 
Neural Networks: Radial Bases Functions (RBF)
Mostafa G. M. Mostafa
 
Introduction to Artificial Neural Networks
Adri Jovin
 
Ad

Similar to Bayes learning (20)

PPTX
Bayesian classification
Zul Kawsar
 
PPTX
ML unit-1.pptx
SwarnaKumariChinni
 
PPT
Lecture 7
butest
 
PPT
Lecture 7
butest
 
PPTX
-BayesianLearning in machine Learning 12
Kumari Naveen
 
PPTX
Managing Data: storage, decisions and classification
Edward Blurock
 
PPT
9-Decision Tree Induction-23-01-2025.ppt
DarrinBright1
 
PPT
My7class
ketan533
 
PPT
Unit-2.ppt
AshwaniShukla47
 
PDF
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
NTNU
 
PPT
original
butest
 
PPT
Practical Artificial Intelligence & Machine Learning (Arturo Servin)
LSx Festival of Technology
 
PPT
13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)
Imran Ali
 
PDF
MS CS - Selecting Machine Learning Algorithm
Kaniska Mandal
 
PPT
Introduction to machine learning
butest
 
PDF
Formalizing (Web) Standards: An Application of Test and Proof
Achim D. Brucker
 
ODP
Applying Linear Optimization Using GLPK
Jeremy Chen
 
PPT
Introduction to C++
Bharat Kalia
 
PDF
SPL 6.1 | Advanced problems on Operators and Math.h function in C
Mohammad Imam Hossain
 
PPT
BAYESIAN theorem and implementation of i
21132067
 
Bayesian classification
Zul Kawsar
 
ML unit-1.pptx
SwarnaKumariChinni
 
Lecture 7
butest
 
Lecture 7
butest
 
-BayesianLearning in machine Learning 12
Kumari Naveen
 
Managing Data: storage, decisions and classification
Edward Blurock
 
9-Decision Tree Induction-23-01-2025.ppt
DarrinBright1
 
My7class
ketan533
 
Unit-2.ppt
AshwaniShukla47
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
NTNU
 
original
butest
 
Practical Artificial Intelligence & Machine Learning (Arturo Servin)
LSx Festival of Technology
 
13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)
Imran Ali
 
MS CS - Selecting Machine Learning Algorithm
Kaniska Mandal
 
Introduction to machine learning
butest
 
Formalizing (Web) Standards: An Application of Test and Proof
Achim D. Brucker
 
Applying Linear Optimization Using GLPK
Jeremy Chen
 
Introduction to C++
Bharat Kalia
 
SPL 6.1 | Advanced problems on Operators and Math.h function in C
Mohammad Imam Hossain
 
BAYESIAN theorem and implementation of i
21132067
 
Ad

Bayes learning

  • 1. MUSA AL-HAWAMDAH 128129001011
  • 2. Bayesian Decision Theory came long before Version Spaces, Decision Tree Learning and Neural Networks. It was studied in the field of Statistical Theory and more specifically, in the field of Pattern Recognition.  Bayesian Decision Theory is at the basis of important learning schemes such as the Naïve Bayes Classifier, Learning Bayesian Belief Networks and the EM Algorithm.  Bayesian Decision Theory is also useful as it provides a framework within which many non- Bayesian classifiers can be studied
  • 3. Bayesian reasoning is applied to decision making and inferential statistics that deals with probability inference. It is used the knowledge of prior events to predict future events.  Example: Predicting the color of marbles in a baske.t
  • 6. The Bayes Theorem:  P(h) : Prior probability of hypothesis h  P(D) : Prior probability of training data D  P(h/D) : Probability of h given D  P(D/h) : Probability of D given h
  • 7. D : 35 year old customer with an income of $50,000 PA.  h : Hypothesis that our customer will buy our computer.  P(h/D) : Probability that customer D will buy our computer given that we know his age and income.  P(h) : Probability that any customer will buy our computer regardless of age (Prior Probability).  P(D/h) : Probability that the customer is 35 yrs old and earns $50,000, given that he has bought our computer (Posterior Probability).  P(D) : Probability that a person from our set of customers is 35 yrs old and earns $50,000.
  • 8. Example:  h1: Customer buys a computer = Yes  h2 : Customer buys a computer = No where h1 and h2 are subsets of our Hypothesis Space „H‟  P(h/D) (Final Outcome) = arg max{ P( D/h1) P(h1) , P(D/h2) P(h2)}  P(D) can be ignored as it is the same for both the terms
  • 9. Theory: Generally we want the most probable hypothesis given the training data hMAP = arg max P(h/D) (where h belongs to H and H is the hypothesis space)
  • 11. If we assume P(hi) = P(hj) where the calculated probabilities amount to the same Further simplification leads to:  hML = arg max P(D/hi) (where hi belongs to H)
  • 12. P (buys computer = yes) = 5/10 = 0.5  P (buys computer = no) = 5/10 = 0.5  P (customer is 35 yrs & earns $50,000) = 4/10 = 0.4  P (customer is 35 yrs & earns $50,000 / buys computer = yes) = 3/5 =0.6  P (customer is 35 yrs & earns $50,000 / buys computer = no) = 1/5 = 0.2
  • 13. Customer buys a computer P(h1/D) = P(h1) * P (D/ h1) / P(D) = 0.5 * 0.6 / 0.4  Customer does not buy a computer P(h2/D) = P(h2) * P (D/ h2) / P(D) = 0.5 * 0.2 / 0.4  Final Outcome = arg max {P(h1/D) , P(h2/D)} = max(0.6, 0.2) => Customer buys a computer
  • 15. It is based on the Bayesian theorem It is particularly suited when the dimensionality of the inputs is high. Parameter estimation for naive Bayes models uses the method of maximum likelihood. In spite over-simplified assumptions, it often performs better in many complex realworl situations.  Advantage: Requires a small amount of training data to estimate the parameters
  • 17. X = ( age= youth, income = medium, student = yes, credit_rating = fair)  A person belonging to tuple X will buy a computer?
  • 18. Derivation:  D : Set of tuples ** Each Tuple is an „n‟ dimensional attribute vector ** X : (x1,x2,x3,…. xn)  Let there be „m‟ Classes : C1,C2,C3…Cm  Naïve Bayes classifier predicts X belongs to Class Ci iff **P (Ci/X) > P(Cj/X) for 1<= j <= m , j <> i  Maximum Posteriori Hypothesis **P(Ci/X) = P(X/Ci) P(Ci) / P(X) **Maximize P(X/Ci) P(Ci) as P(X) is constant
  • 19. With many attributes, it is computationally expensive to evaluate P(X/Ci).  Naïve Assumption of “class conditional independence”
  • 20. P(C1) = P(buys_computer = yes) = 9/14 =0.643  P(C2) = P(buys_computer = no) = 5/14= 0.357  P(age=youth /buys_computer = yes) = 2/9 =0.222  P(age=youth /buys_computer = no) = 3/5 =0.600  P(income=medium /buys_computer = yes) = 4/9 =0.444  P(income=medium /buys_computer = no) = 2/5 =0.400  P(student=yes /buys_computer = yes) = 6/9 =0.667  P(student=yes/buys_computer = no) = 1/5 =0.200  P(credit rating=fair /buys_computer = yes) = 6/9 =0.667  P(credit rating=fair /buys_computer = no) = 2/5 =0.400
  • 21. P(X/Buys a computer = yes) = P(age=youth /buys_computer = yes) * P(income=medium /buys_computer = yes) * P(student=yes /buys_computer = yes) * P(credit rating=fair /buys_computer = yes) = 0.222 * 0.444 * 0.667 * 0.667 = 0.044 P(X/Buys a computer = No) = 0.600 * 0.400 * 0.200 * 0.400 = 0.019
  • 22. Find class Ci that Maximizes P(X/Ci) * P(Ci) =>P(X/Buys a computer = yes) * P(buys_computer = yes) = 0.028 =>P(X/Buys a computer = No) * P(buys_computer = no) = 0.007  Prediction : Buys a computer for Tuple X
  • 23. http://en.wikipedia.org/wiki/Bayesian_probability  http://en.wikipedia.org/wiki/Naive_Bayes_classifier  http://www.let.rug.nl/~tiedeman/ml05/03_bayesian_hando ut. pdf  http://www.statsoft.com/textbook/stnaiveb.html  http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-2/www/mlbook/ch6.pdf  Chai, K.; H. T. Hn, H. L. Chieu; “Bayesian Online Classifiers for Text Classification and Filtering”,  Proceedings of the 25th annual international ACM SIGIR conference on Research and Development in Information Retrieval, August 2002, pp 97-104.  DATA MINING Concepts and Techniques,Jiawei Han, Micheline Kamber Morgan Kaufman Publishers, 2003.