SlideShare a Scribd company logo
EM algorithm and its application in Probabilistic Latent
                   Semantic Analysis (pLSA)

                                                 Duc-Hieu Tran
                                             tdh.net [at] gmail.com

                                            Nanyang Technological University


                                                   July 27, 2010




Duc-Hieu Trantdh.net [at] gmail.com (NTU)              EM in pLSA              July 27, 2010   1 / 27
The parameter estimation problem


  Outline



   The parameter estimation problem


   EM algorithm


   Probabilistic Latent Sematic Analysis


   Reference




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA   July 27, 2010   2 / 27
The parameter estimation problem


  Introduction



   Known the prior probabilities P(ωi ), class-conditional densities p(x|ωi )
   =⇒ optimal classifier
           P(ωj |x) ∝ p(x|ωj )p(ωj )
           decide ωi if p(ωi |x) > P(ωj |x), ∀j = i
   In practice, p(x|ωi ) is unknown – just estimated from training samples
   (e.g., assume p(x|ωi ) ∼ N (µi , Σi )).




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA   July 27, 2010   3 / 27
The parameter estimation problem


  Frequentist vs. Bayesian schools


   Frequentist
           parameters – quantities whose values are fixed but unknown.
           the best estimate of their values – the one that maximizes the
           probability of obtaining the observed samples.
   Bayesian
           paramters – random variables having some known prior distribution.
           observation of the samples converts this to a posterior density;
           revising our opinion about the true values of the parameters.




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA   July 27, 2010   4 / 27
The parameter estimation problem


  Examples

           training samples: S = {(x (1) , y (1) ), . . . (x (m) , y (m) )}
           frequentist: maximum likelihood

                                                max            p(y (i) |x (i) ; θ)
                                                   θ
                                                          i

           bayesian: P(θ) – prior, e.g., P(θ) ∼ N (0, I)
                                                        m
                                    P(θ|S) ∝                   P(y (i) |x (i) , θ) .P(θ)
                                                       i=1

                                              θMAP = arg max P(θ|S)
                                                                  θ




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA                         July 27, 2010   5 / 27
EM algorithm


  Outline



   The parameter estimation problem


   EM algorithm


   Probabilistic Latent Sematic Analysis


   Reference




Duc-Hieu Trantdh.net [at] gmail.com (NTU)           EM in pLSA   July 27, 2010   6 / 27
EM algorithm


  An estimation problem

           training set of m independent samples: {x (1) , x (2) , . . . , x (m) }
           goal: fit the paramters of a model p(x, z) to the data
           the likelihood:
                                        m                         m
                                                    (i)
                            (θ) =           log p(x ; θ) =              log       p(x (i) , z; θ)
                                      i=1                         i=1         z

           explicitly maximize (θ) might be difficult.
           z - laten random variable
           if z (i) were observed, then maximum likelihood estimation would be
           easy.
           strategy: repeatedly construct a lower-bound on                           (E-step) and
           optimize that lower-bound (M-step).


Duc-Hieu Trantdh.net [at] gmail.com (NTU)            EM in pLSA                             July 27, 2010   7 / 27
EM algorithm


  EM algorithm (1)
           digression: Jensen’s inequality.
           f – convex function; E [f (X )] ≥ f (E [X ])
           for each i, Qi – distribution of z: z Qi (z) = 1, Qi (z) ≥ 0

                          (θ) =             log p(x (i) ; θ)
                                     i

                               =            log             p(x (i) , z (i) ; θ)
                                     i              z (i)
                                                                          p(x (i) , z (i) ; θ)
                               =            log             Qi (z (i) )                                          (1)
                                     i
                                                                             Qi (z (i) )
                                                    z (i)
                              applying Jensen’s inequality, concave function log
                                                                       p(x (i) , z (i) ; θ)
                               ≥                    Qi (z (i) )log                                               (2)
                                     i
                                                                          Qi (z (i) )
                                            z (i)

      More detail . . .
Duc-Hieu Trantdh.net [at] gmail.com (NTU)                      EM in pLSA                        July 27, 2010   8 / 27
EM algorithm


  EM algorithm (2)
           for any set of distribution Qi , formula (2) gives a lower-bound on (θ)
           how to choose Qi ?
           strategy: make the inequality hold with equality at our particular
           value of θ.
           require:
                                                 p(x (i) , z (i) ; θ)
                                                                      =c
                                                    Qi (z (i) )
           c – constant not depend on z (i)
           choose: Qi (z (i) ) ∝ p(x (i) , z (i) ; θ)
           we know           z   Qi (z (i) ) = 1, so

                                     p(x (i) , z (i) ; θ)   p(x (i) , z (i) ; θ)
                  Qi (z (i) ) =                           =                      = p(z (i) |x (i) ; θ)
                                      z p(x (i) , z; θ)       p(x (i) ; θ)

Duc-Hieu Trantdh.net [at] gmail.com (NTU)            EM in pLSA                           July 27, 2010   9 / 27
EM algorithm


  EM algorithm (3)


           Qi – posterior distribution of z (i) given x (i) and the parameter θ
   EM algorithm: repeat until convergence
           E-step: for each i
                                                Qi (z (i) ) := p(z (i) |x (i) ; θ)
           M-step:

                                                                                   p(x (i) , z (i) ; θ)
                             θ := arg max                        Qi (z (i) ) log
                                            θ        i
                                                                                      Qi (z (i) )
                                                         z (i)

   The algorithm will converge, since (θ(t) ) ≤ (θ(t+1) )



Duc-Hieu Trantdh.net [at] gmail.com (NTU)                EM in pLSA                                July 27, 2010   10 / 27
EM algorithm


  EM algorithm (4)
   Digression: coordinate ascent algorithm.
       maxW (α1 , . . . αm )
             α
           loop until converge:
           for i ∈ 1, . . . , m:

                                    αi = arg max W (α1 , . . . , αi , . . . , αm )
                                                                 ˆ
                                              αi
                                              ˆ

   EM-algorithm as coordinate ascent algorithm

                                                                             p(x (i) , z (i) ; θ)
                               J(Q, θ) =                   Qi (z (i) ) log
                                              i
                                                                                Qi (z (i) )
                                                   z (i)

            (θ) ≥ J(Q, θ)
           EM algorithm can be viewed as coordinate ascent on J
           E-step: maximize w.r.t Q
           M-step: maximize w.r.t θ
Duc-Hieu Trantdh.net [at] gmail.com (NTU)            EM in pLSA                                July 27, 2010   11 / 27
Probabilistic Latent Sematic Analysis


  Outline



   The parameter estimation problem


   EM algorithm


   Probabilistic Latent Sematic Analysis


   Reference




Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA   July 27, 2010   12 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (1)

           set of documents D = {d1 , . . . , dN }
           set of words W = {w1 , . . . , wM }
           set of unobserved classes Z = {z1 , . . . , zK }
           conditional independence assumption:

                                         P(di , wj |zk ) = P(di |zk )P(wj |zk )                                (3)

           so,
                                                              K
                                        P(wj |di ) =               P(zk |di )P(wj |zk )                        (4)
                                                             k=1
                                                                   K
                                   P(di , wj ) = P(di )                 P(wj |zk )P(zk |di )
                                                                  k=1
      More detail . . .



Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                           July 27, 2010   13 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (2)

           n(di , wj ) – # word wj in doc. di
           Likelihood
                                     N                    N       M
                            L=            P(di ) =                     [P(di , wj )]n(di ,wj )
                                    i=1                 i=1 j=1

                                     N      M                 K                                  n(di ,wj )

                               =                  P(di )              P(wj |zk )P(zk |di )
                                    i=1 j=1                 k=1

           log-likelihood           = log(L)
                  N     M                                                               K
             =                 n(di , wj ) log P(di ) + n(di , wj ) log                     P(wj |zk )P(zk |di )
                 i=1 j=1                                                              k=1



Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                                  July 27, 2010   14 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (3)
           maximize w.r.t P(wj |zk ), P(zk |di )
           ≈ maximize
                              N      M                          K
                                         n(di , wj ) log             P(wj |zk )P(zk |di )
                            i=1 j=1                           k=1
                                    N    M                           K
                                                                                      P(wj |zk )P(zk |di )
                            =                 n(di , wj ) log              Qk (zk )
                                                                                           Qk (zk )
                                   i=1 j=1                           k=1
                                    N    M                     K
                                                                                      P(wj |zk )P(zk |di )
                            ≥                 n(di , wj )            Qk (zk ) log
                                                                                           Qk (zk )
                                   i=1 j=1                   k=1

           choose
                                                     P(wj |zk )P(zk |di )
                                  Qk (zk ) =         K
                                                                                       = P(zk |di , wj )
                                                     l=1 P(wj |zl )P(zl |di )
              More detail . . .

Duc-Hieu Trantdh.net [at] gmail.com (NTU)                    EM in pLSA                             July 27, 2010   15 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (4)


           ≈ maximize (w.r.t P(wj |zk ), P(zk |di ))

                         N    M                      K
                                                                                P(wj |zk )P(zk |di )
                                    n(di , wj )           P(zk |di , wj ) log
                                                                                  P(zk |di , wj )
                       i=1 j=1                     k=1

           ≈ maximize
                        N     M                     K
                                   n(di , wj )           P(zk |di , wj ) log[P(wj |zk )P(zk |di )]
                       i=1 j=1                     k=1




Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                         July 27, 2010   16 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (5)
   EM-algorithm
      E-step: update
                                                                  P(wj |zk )P(zk |di )
                                    P(zk |di , wj ) =             K
                                                                  l=1 P(wj |zl )P(zl |di )
           M-step: maximize w.r.t P(wj |zk ), P(zk |di )
                        N     M                     K
                                   n(di , wj )           P(zk |di , wj ) log[P(wj |zk )P(zk |di )]
                       i=1 j=1                     k=1

           subject to
                                            M
                                                  P(wj |zk ) = 1, k ∈ {1 . . . K }
                                            j=1
                                             K
                                                  P(zk |di ) = 1, i ∈ {1 . . . N}
                                            k=1
Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                         July 27, 2010   17 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (6)


   Solution of maximization problem in M-step:
                                                      N
                                                      i=1 n(di , wj )P(zk |di , wj )
                          P(wj |zk ) =           M      N
                                                 m=1    n=1 n(dn , wm )P(zk |dn , wm )
                                                 M
                                                 j=1 n(di , wj )P(zk |di , wj )
                          P(zk |di ) =
                                                               n(di )
                                M
   where, n(di ) =              j=1 n(di , wj )
      More detail . . .




Duc-Hieu Trantdh.net [at] gmail.com (NTU)                  EM in pLSA               July 27, 2010   18 / 27
Probabilistic Latent Sematic Analysis


  Probabilistic Latent Semantic Analysis (7)

   All together
           E-step:
                                                                  P(wj |zk )P(zk |di )
                                    P(zk |di , wj ) =             K
                                                                  l=1 P(wj |zl )P(zl |di )
           M-step:
                                                         N
                                                         i=1 n(di , wj )P(zk |di , wj )
                           P(wj |zk ) =             M      N
                                                    m=1    n=1 n(dn , wm )P(zk |dn , wm )
                                                    M
                                                    j=1 n(di , wj )P(zk |di , wj )
                            P(zk |di ) =
                                                                  n(di )




Duc-Hieu Trantdh.net [at] gmail.com (NTU)                 EM in pLSA                         July 27, 2010   19 / 27
Reference


  Outline



   The parameter estimation problem


   EM algorithm


   Probabilistic Latent Sematic Analysis


   Reference




Duc-Hieu Trantdh.net [at] gmail.com (NTU)        EM in pLSA   July 27, 2010   20 / 27
Reference




           R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification,
           Wiley-Interscience, 2001.
           T. Hofmann, ”Unsupervised learning by probabilistic latent semantic
           analysis,” Machine Learning, vol. 42, 2001, p. 177–196.
           Course: ”Machine Learning CS229”, Andrew Ng, Stanford University




Duc-Hieu Trantdh.net [at] gmail.com (NTU)        EM in pLSA      July 27, 2010   21 / 27
Appendix

   Generative model for word/document co-occurence
       select a document di with probability (w.p) P(di )
       pick a latent class zk w.p P(zk |di )
       generate a word wj w.p P(wj |zk )
                                K                                K
            P(di , wj ) =           P(di , wj |zk )P(zk ) =            P(wj |zk )P(di |zk )P(zk )
                              k=1                                k=1
                                                                  K
                                                            =          P(wj |zk )P(zk |di )P(di )
                                                                 k=1
                                                                          K
                                                            = P(di )           P(wj |zk )P(zk |di )
                                                                         k=1
                                               P(di , wj ) = P(wj |di )P(di )
                                                                 K
                                            =⇒ P(wj |di ) =            P(zk |di )P(wj |zk )
                                                                 k=1

Duc-Hieu Trantdh.net [at] gmail.com (NTU)           EM in pLSA                          July 27, 2010   22 / 27
Appendix




                                                        K
                                       P(wj |di ) =          P(zk |di )P(wj |zk )
                                                       k=1
                       K
           since       k=1 P(zk |di )       = 1, P(wj , di ) is convex combination of P(wj |zk )
           ≈ each document is modelled as a mixture of topics




                                                                                                    Return




Duc-Hieu Trantdh.net [at] gmail.com (NTU)            EM in pLSA                     July 27, 2010     23 / 27
Appendix




                                               P(di , wj |zk )P(zk )
                               P(zk |di , wj ) =                                            (5)
                                                   P(di , wj )
                                               P(wj |zk )P(di |zk )P(zk )
                                             =                                              (6)
                                                       P(di , wj )
                                               P(wj |zk )P(zk |di )
                                             =                                              (7)
                                                   P(wj |di )
                                                 P(wj |zk )P(zk |di )
                                             = K                                            (8)
                                                 l=1 P(wj |zl )P(zl |di )

   From (5) to (6) by conditional independence assumption (3). From (7) to
   (8) by (4).                                                        Return




Duc-Hieu Trantdh.net [at] gmail.com (NTU)          EM in pLSA               July 27, 2010   24 / 27
Appendix




   Lagrange multipliers τk , ρi
                        N     M                    K
               H=                  n(di , wj )          P(zk |di , wj ) log[P(wj |zk )P(zk |di )]
                       i=1 j=1                    k=1
                                                              
                        K                   M                        N              K
                   +         τk 1 −              P(wj |di ) +            ρi 1 −         P(zk |di )
                       k=1                  j=1                      i=1            k=1

                                                  N
                         ∂H                       i=1 P(zk |di , wj )n(di , wj )
                                   =                                                − τk = 0
                       ∂P(wj |zk )                        P(wj |zk )
                                                  M
                         ∂H                       j=1 n(di , wj )P(zk |di , wj )
                                   =                                                − ρi = 0
                       ∂P(zk |di )                        P(zk |di )




Duc-Hieu Trantdh.net [at] gmail.com (NTU)               EM in pLSA                           July 27, 2010   25 / 27
Appendix




                M
   from         j=1 P(wj |zk )      =1

                                             M   N
                                   τk =               P(zk |di , wj )n(di , wj )
                                            j=1 i=1

                K
   from         k=1 P(zk |di , wj )         =1

                                                     ρi = n(di )

   =⇒ P(wj |zk ), P(zk |di )                                                                       Return




Duc-Hieu Trantdh.net [at] gmail.com (NTU)             EM in pLSA                   July 27, 2010     26 / 27
Appendix


  Applying the Jensen’s inequality




           f (x) = log (x), concave function

                                    p(x (i) , z (i) ; θ)                      p(x (i) , z (i) ; θ)
                f    Ez (i) ∼Qi                              ≥ Ez (i) ∼Qi f
                                       Qi (z (i) )                               Qi (z (i) )

                                                                                                      Return




Duc-Hieu Trantdh.net [at] gmail.com (NTU)             EM in pLSA                      July 27, 2010     27 / 27

More Related Content

What's hot (20)

PPTX
1.2. introduction to automata theory
Sampath Kumar S
 
PDF
Linguistic hedges in fuzzy logic
Siksha 'O' Anusandhan (Deemed to be University )
 
PDF
Bayesian classification
Manu Chandel
 
PDF
5.1 K plus proches voisins
Boris Guarisma
 
PPTX
Support Vector Machines- SVM
Carlo Carandang
 
PDF
Linear regression
Luis Serrano
 
PDF
Expectation maximization
LALAOUIBENCHERIFSIDI
 
PDF
Turing machine
Umar Alharaky
 
PPTX
Introduction on Prolog - Programming in Logic
Vishal Tandel
 
PDF
Probability Density Functions
guestb86588
 
PDF
Converting Scikit-Learn to PMML
Villu Ruusmann
 
PDF
Topology for Computing: Homology
Sangwoo Mo
 
PPTX
PRML Chapter 7
Sunwoo Kim
 
PDF
First Order Logic resolution
Amar Jukuntla
 
PPTX
Theory of automata and formal language
Rabia Khalid
 
PPTX
Poisson Distribution, Poisson Process & Geometric Distribution
mathscontent
 
PPT
Propositional And First-Order Logic
ankush_kumar
 
PPTX
ML_ Unit_1_PART_A
Srimatre K
 
PPTX
Robot Software Architecture (Mobile Robots)
Satyanarayana Mekala
 
PPTX
Kalman filter for object tracking
Mohit Yadav
 
1.2. introduction to automata theory
Sampath Kumar S
 
Linguistic hedges in fuzzy logic
Siksha 'O' Anusandhan (Deemed to be University )
 
Bayesian classification
Manu Chandel
 
5.1 K plus proches voisins
Boris Guarisma
 
Support Vector Machines- SVM
Carlo Carandang
 
Linear regression
Luis Serrano
 
Expectation maximization
LALAOUIBENCHERIFSIDI
 
Turing machine
Umar Alharaky
 
Introduction on Prolog - Programming in Logic
Vishal Tandel
 
Probability Density Functions
guestb86588
 
Converting Scikit-Learn to PMML
Villu Ruusmann
 
Topology for Computing: Homology
Sangwoo Mo
 
PRML Chapter 7
Sunwoo Kim
 
First Order Logic resolution
Amar Jukuntla
 
Theory of automata and formal language
Rabia Khalid
 
Poisson Distribution, Poisson Process & Geometric Distribution
mathscontent
 
Propositional And First-Order Logic
ankush_kumar
 
ML_ Unit_1_PART_A
Srimatre K
 
Robot Software Architecture (Mobile Robots)
Satyanarayana Mekala
 
Kalman filter for object tracking
Mohit Yadav
 

Similar to EM algorithm and its application in probabilistic latent semantic analysis (20)

PDF
Considerate Approaches to ABC Model Selection
Michael Stumpf
 
PDF
Discussion of Faming Liang's talk
Christian Robert
 
PDF
Runtime Analysis of Population-based Evolutionary Algorithms
PK Lehre
 
PDF
Runtime Analysis of Population-based Evolutionary Algorithms
Per Kristian Lehre
 
PDF
Machine learning (9)
NYversity
 
PDF
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Chiheb Ben Hammouda
 
PDF
Ada boost brown boost performance with noisy data
Shadhin Rahman
 
DOCX
Logics of the laplace transform
Tarun Gehlot
 
PDF
Chapter2: Likelihood-based approach
Jae-kwang Kim
 
PDF
Olivier Cappé's talk at BigMC March 2011
BigMC
 
PDF
Ml mle_bayes
Phong Vo
 
PDF
11.fixed point theorem of discontinuity and weak compatibility in non complet...
Alexander Decker
 
PDF
Fixed point theorem of discontinuity and weak compatibility in non complete n...
Alexander Decker
 
PDF
sada_pres
Stephane Senecal
 
PDF
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
Eesti Pank
 
PDF
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Enthought, Inc.
 
PDF
Quantum modes - Ion Cotaescu
SEENET-MTP
 
PDF
Computation of the marginal likelihood
BigMC
 
PDF
Image denoising
Yap Wooi Hen
 
PDF
k-MLE: A fast algorithm for learning statistical mixture models
Frank Nielsen
 
Considerate Approaches to ABC Model Selection
Michael Stumpf
 
Discussion of Faming Liang's talk
Christian Robert
 
Runtime Analysis of Population-based Evolutionary Algorithms
PK Lehre
 
Runtime Analysis of Population-based Evolutionary Algorithms
Per Kristian Lehre
 
Machine learning (9)
NYversity
 
Seminar Talk: Multilevel Hybrid Split Step Implicit Tau-Leap for Stochastic R...
Chiheb Ben Hammouda
 
Ada boost brown boost performance with noisy data
Shadhin Rahman
 
Logics of the laplace transform
Tarun Gehlot
 
Chapter2: Likelihood-based approach
Jae-kwang Kim
 
Olivier Cappé's talk at BigMC March 2011
BigMC
 
Ml mle_bayes
Phong Vo
 
11.fixed point theorem of discontinuity and weak compatibility in non complet...
Alexander Decker
 
Fixed point theorem of discontinuity and weak compatibility in non complete n...
Alexander Decker
 
sada_pres
Stephane Senecal
 
Panel Data Binary Response Model in a Triangular System with Unobserved Heter...
Eesti Pank
 
Scientific Computing with Python Webinar 9/18/2009:Curve Fitting
Enthought, Inc.
 
Quantum modes - Ion Cotaescu
SEENET-MTP
 
Computation of the marginal likelihood
BigMC
 
Image denoising
Yap Wooi Hen
 
k-MLE: A fast algorithm for learning statistical mixture models
Frank Nielsen
 
Ad

More from zukun (20)

PDF
My lyn tutorial 2009
zukun
 
PDF
ETHZ CV2012: Tutorial openCV
zukun
 
PDF
ETHZ CV2012: Information
zukun
 
PDF
Siwei lyu: natural image statistics
zukun
 
PDF
Lecture9 camera calibration
zukun
 
PDF
Brunelli 2008: template matching techniques in computer vision
zukun
 
PDF
Modern features-part-4-evaluation
zukun
 
PDF
Modern features-part-3-software
zukun
 
PDF
Modern features-part-2-descriptors
zukun
 
PDF
Modern features-part-1-detectors
zukun
 
PDF
Modern features-part-0-intro
zukun
 
PDF
Lecture 02 internet video search
zukun
 
PDF
Lecture 01 internet video search
zukun
 
PDF
Lecture 03 internet video search
zukun
 
PDF
Icml2012 tutorial representation_learning
zukun
 
PPT
Advances in discrete energy minimisation for computer vision
zukun
 
PDF
Gephi tutorial: quick start
zukun
 
PDF
Object recognition with pictorial structures
zukun
 
PDF
Iccv2011 learning spatiotemporal graphs of human activities
zukun
 
PDF
Icml2012 learning hierarchies of invariant features
zukun
 
My lyn tutorial 2009
zukun
 
ETHZ CV2012: Tutorial openCV
zukun
 
ETHZ CV2012: Information
zukun
 
Siwei lyu: natural image statistics
zukun
 
Lecture9 camera calibration
zukun
 
Brunelli 2008: template matching techniques in computer vision
zukun
 
Modern features-part-4-evaluation
zukun
 
Modern features-part-3-software
zukun
 
Modern features-part-2-descriptors
zukun
 
Modern features-part-1-detectors
zukun
 
Modern features-part-0-intro
zukun
 
Lecture 02 internet video search
zukun
 
Lecture 01 internet video search
zukun
 
Lecture 03 internet video search
zukun
 
Icml2012 tutorial representation_learning
zukun
 
Advances in discrete energy minimisation for computer vision
zukun
 
Gephi tutorial: quick start
zukun
 
Object recognition with pictorial structures
zukun
 
Iccv2011 learning spatiotemporal graphs of human activities
zukun
 
Icml2012 learning hierarchies of invariant features
zukun
 
Ad

Recently uploaded (20)

PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Biography of Daniel Podor.pdf
Daniel Podor
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 

EM algorithm and its application in probabilistic latent semantic analysis

  • 1. EM algorithm and its application in Probabilistic Latent Semantic Analysis (pLSA) Duc-Hieu Tran tdh.net [at] gmail.com Nanyang Technological University July 27, 2010 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 1 / 27
  • 2. The parameter estimation problem Outline The parameter estimation problem EM algorithm Probabilistic Latent Sematic Analysis Reference Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 2 / 27
  • 3. The parameter estimation problem Introduction Known the prior probabilities P(ωi ), class-conditional densities p(x|ωi ) =⇒ optimal classifier P(ωj |x) ∝ p(x|ωj )p(ωj ) decide ωi if p(ωi |x) > P(ωj |x), ∀j = i In practice, p(x|ωi ) is unknown – just estimated from training samples (e.g., assume p(x|ωi ) ∼ N (µi , Σi )). Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 3 / 27
  • 4. The parameter estimation problem Frequentist vs. Bayesian schools Frequentist parameters – quantities whose values are fixed but unknown. the best estimate of their values – the one that maximizes the probability of obtaining the observed samples. Bayesian paramters – random variables having some known prior distribution. observation of the samples converts this to a posterior density; revising our opinion about the true values of the parameters. Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 4 / 27
  • 5. The parameter estimation problem Examples training samples: S = {(x (1) , y (1) ), . . . (x (m) , y (m) )} frequentist: maximum likelihood max p(y (i) |x (i) ; θ) θ i bayesian: P(θ) – prior, e.g., P(θ) ∼ N (0, I) m P(θ|S) ∝ P(y (i) |x (i) , θ) .P(θ) i=1 θMAP = arg max P(θ|S) θ Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 5 / 27
  • 6. EM algorithm Outline The parameter estimation problem EM algorithm Probabilistic Latent Sematic Analysis Reference Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 6 / 27
  • 7. EM algorithm An estimation problem training set of m independent samples: {x (1) , x (2) , . . . , x (m) } goal: fit the paramters of a model p(x, z) to the data the likelihood: m m (i) (θ) = log p(x ; θ) = log p(x (i) , z; θ) i=1 i=1 z explicitly maximize (θ) might be difficult. z - laten random variable if z (i) were observed, then maximum likelihood estimation would be easy. strategy: repeatedly construct a lower-bound on (E-step) and optimize that lower-bound (M-step). Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 7 / 27
  • 8. EM algorithm EM algorithm (1) digression: Jensen’s inequality. f – convex function; E [f (X )] ≥ f (E [X ]) for each i, Qi – distribution of z: z Qi (z) = 1, Qi (z) ≥ 0 (θ) = log p(x (i) ; θ) i = log p(x (i) , z (i) ; θ) i z (i) p(x (i) , z (i) ; θ) = log Qi (z (i) ) (1) i Qi (z (i) ) z (i) applying Jensen’s inequality, concave function log p(x (i) , z (i) ; θ) ≥ Qi (z (i) )log (2) i Qi (z (i) ) z (i) More detail . . . Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 8 / 27
  • 9. EM algorithm EM algorithm (2) for any set of distribution Qi , formula (2) gives a lower-bound on (θ) how to choose Qi ? strategy: make the inequality hold with equality at our particular value of θ. require: p(x (i) , z (i) ; θ) =c Qi (z (i) ) c – constant not depend on z (i) choose: Qi (z (i) ) ∝ p(x (i) , z (i) ; θ) we know z Qi (z (i) ) = 1, so p(x (i) , z (i) ; θ) p(x (i) , z (i) ; θ) Qi (z (i) ) = = = p(z (i) |x (i) ; θ) z p(x (i) , z; θ) p(x (i) ; θ) Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 9 / 27
  • 10. EM algorithm EM algorithm (3) Qi – posterior distribution of z (i) given x (i) and the parameter θ EM algorithm: repeat until convergence E-step: for each i Qi (z (i) ) := p(z (i) |x (i) ; θ) M-step: p(x (i) , z (i) ; θ) θ := arg max Qi (z (i) ) log θ i Qi (z (i) ) z (i) The algorithm will converge, since (θ(t) ) ≤ (θ(t+1) ) Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 10 / 27
  • 11. EM algorithm EM algorithm (4) Digression: coordinate ascent algorithm. maxW (α1 , . . . αm ) α loop until converge: for i ∈ 1, . . . , m: αi = arg max W (α1 , . . . , αi , . . . , αm ) ˆ αi ˆ EM-algorithm as coordinate ascent algorithm p(x (i) , z (i) ; θ) J(Q, θ) = Qi (z (i) ) log i Qi (z (i) ) z (i) (θ) ≥ J(Q, θ) EM algorithm can be viewed as coordinate ascent on J E-step: maximize w.r.t Q M-step: maximize w.r.t θ Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 11 / 27
  • 12. Probabilistic Latent Sematic Analysis Outline The parameter estimation problem EM algorithm Probabilistic Latent Sematic Analysis Reference Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 12 / 27
  • 13. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (1) set of documents D = {d1 , . . . , dN } set of words W = {w1 , . . . , wM } set of unobserved classes Z = {z1 , . . . , zK } conditional independence assumption: P(di , wj |zk ) = P(di |zk )P(wj |zk ) (3) so, K P(wj |di ) = P(zk |di )P(wj |zk ) (4) k=1 K P(di , wj ) = P(di ) P(wj |zk )P(zk |di ) k=1 More detail . . . Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 13 / 27
  • 14. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (2) n(di , wj ) – # word wj in doc. di Likelihood N N M L= P(di ) = [P(di , wj )]n(di ,wj ) i=1 i=1 j=1 N M K n(di ,wj ) = P(di ) P(wj |zk )P(zk |di ) i=1 j=1 k=1 log-likelihood = log(L) N M K = n(di , wj ) log P(di ) + n(di , wj ) log P(wj |zk )P(zk |di ) i=1 j=1 k=1 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 14 / 27
  • 15. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (3) maximize w.r.t P(wj |zk ), P(zk |di ) ≈ maximize N M K n(di , wj ) log P(wj |zk )P(zk |di ) i=1 j=1 k=1 N M K P(wj |zk )P(zk |di ) = n(di , wj ) log Qk (zk ) Qk (zk ) i=1 j=1 k=1 N M K P(wj |zk )P(zk |di ) ≥ n(di , wj ) Qk (zk ) log Qk (zk ) i=1 j=1 k=1 choose P(wj |zk )P(zk |di ) Qk (zk ) = K = P(zk |di , wj ) l=1 P(wj |zl )P(zl |di ) More detail . . . Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 15 / 27
  • 16. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (4) ≈ maximize (w.r.t P(wj |zk ), P(zk |di )) N M K P(wj |zk )P(zk |di ) n(di , wj ) P(zk |di , wj ) log P(zk |di , wj ) i=1 j=1 k=1 ≈ maximize N M K n(di , wj ) P(zk |di , wj ) log[P(wj |zk )P(zk |di )] i=1 j=1 k=1 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 16 / 27
  • 17. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (5) EM-algorithm E-step: update P(wj |zk )P(zk |di ) P(zk |di , wj ) = K l=1 P(wj |zl )P(zl |di ) M-step: maximize w.r.t P(wj |zk ), P(zk |di ) N M K n(di , wj ) P(zk |di , wj ) log[P(wj |zk )P(zk |di )] i=1 j=1 k=1 subject to M P(wj |zk ) = 1, k ∈ {1 . . . K } j=1 K P(zk |di ) = 1, i ∈ {1 . . . N} k=1 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 17 / 27
  • 18. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (6) Solution of maximization problem in M-step: N i=1 n(di , wj )P(zk |di , wj ) P(wj |zk ) = M N m=1 n=1 n(dn , wm )P(zk |dn , wm ) M j=1 n(di , wj )P(zk |di , wj ) P(zk |di ) = n(di ) M where, n(di ) = j=1 n(di , wj ) More detail . . . Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 18 / 27
  • 19. Probabilistic Latent Sematic Analysis Probabilistic Latent Semantic Analysis (7) All together E-step: P(wj |zk )P(zk |di ) P(zk |di , wj ) = K l=1 P(wj |zl )P(zl |di ) M-step: N i=1 n(di , wj )P(zk |di , wj ) P(wj |zk ) = M N m=1 n=1 n(dn , wm )P(zk |dn , wm ) M j=1 n(di , wj )P(zk |di , wj ) P(zk |di ) = n(di ) Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 19 / 27
  • 20. Reference Outline The parameter estimation problem EM algorithm Probabilistic Latent Sematic Analysis Reference Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 20 / 27
  • 21. Reference R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, Wiley-Interscience, 2001. T. Hofmann, ”Unsupervised learning by probabilistic latent semantic analysis,” Machine Learning, vol. 42, 2001, p. 177–196. Course: ”Machine Learning CS229”, Andrew Ng, Stanford University Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 21 / 27
  • 22. Appendix Generative model for word/document co-occurence select a document di with probability (w.p) P(di ) pick a latent class zk w.p P(zk |di ) generate a word wj w.p P(wj |zk ) K K P(di , wj ) = P(di , wj |zk )P(zk ) = P(wj |zk )P(di |zk )P(zk ) k=1 k=1 K = P(wj |zk )P(zk |di )P(di ) k=1 K = P(di ) P(wj |zk )P(zk |di ) k=1 P(di , wj ) = P(wj |di )P(di ) K =⇒ P(wj |di ) = P(zk |di )P(wj |zk ) k=1 Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 22 / 27
  • 23. Appendix K P(wj |di ) = P(zk |di )P(wj |zk ) k=1 K since k=1 P(zk |di ) = 1, P(wj , di ) is convex combination of P(wj |zk ) ≈ each document is modelled as a mixture of topics Return Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 23 / 27
  • 24. Appendix P(di , wj |zk )P(zk ) P(zk |di , wj ) = (5) P(di , wj ) P(wj |zk )P(di |zk )P(zk ) = (6) P(di , wj ) P(wj |zk )P(zk |di ) = (7) P(wj |di ) P(wj |zk )P(zk |di ) = K (8) l=1 P(wj |zl )P(zl |di ) From (5) to (6) by conditional independence assumption (3). From (7) to (8) by (4). Return Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 24 / 27
  • 25. Appendix Lagrange multipliers τk , ρi N M K H= n(di , wj ) P(zk |di , wj ) log[P(wj |zk )P(zk |di )] i=1 j=1 k=1   K M N K + τk 1 − P(wj |di ) + ρi 1 − P(zk |di ) k=1 j=1 i=1 k=1 N ∂H i=1 P(zk |di , wj )n(di , wj ) = − τk = 0 ∂P(wj |zk ) P(wj |zk ) M ∂H j=1 n(di , wj )P(zk |di , wj ) = − ρi = 0 ∂P(zk |di ) P(zk |di ) Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 25 / 27
  • 26. Appendix M from j=1 P(wj |zk ) =1 M N τk = P(zk |di , wj )n(di , wj ) j=1 i=1 K from k=1 P(zk |di , wj ) =1 ρi = n(di ) =⇒ P(wj |zk ), P(zk |di ) Return Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 26 / 27
  • 27. Appendix Applying the Jensen’s inequality f (x) = log (x), concave function p(x (i) , z (i) ; θ) p(x (i) , z (i) ; θ) f Ez (i) ∼Qi ≥ Ez (i) ∼Qi f Qi (z (i) ) Qi (z (i) ) Return Duc-Hieu Trantdh.net [at] gmail.com (NTU) EM in pLSA July 27, 2010 27 / 27