SlideShare a Scribd company logo
An Introduction to HMM


        Browny
       2010.07.21
MM vs. HMM

                States


States




             Observations
Markov Model
• Given 3 weather states:
  – {S1, S2, S3} = {rain, cloudy, sunny}
                    Rain   Cloudy   Sunny
          Rain      0.4    0.3      0.3
          Cloudy    0.2    0.6      0.2
          Sunny     0.1    0.1      0.8


• What is the probabilities for next 7 days
  will be {sun, sun, rain, rain, sun, cloud,
  sun} ?
Hidden Markov Model
• The states
  – We don’t understand, Hidden!
  – But it can be indirectly observed


• Example
  – 北極or赤道(model), Hot/Cold(state), 1/2/3
    ice cream(observation)
Hidden Markov Model
• The observation is a probability function
  of state which is not observable directly


                           Hidden States
HMM Elements
• N, the number of states in the model
• M, the number of distinct observation
  symbols
• A, the state transition probability distribution
• B, the observation symbol probability
  distribution in states
• π, the initial state distribution     λ: model
Example
            P(…|C)           P(…|H)     P(…|Start)
 P(1|…)       0.7              0.1

 P(2|…)      0.2       B:      0.2
                   Observation
 P(3|…)      0.1               0.7

 P(C|…)      0.8               0.1           0.5

                        A:            π:
 P(H|…)      0.1               0.8            0.5
                    Transition        initial

P(STOP|…)    0.1               0.1            0
3 Problems
3 Problems
1. 觀察到的現象最符合哪一個模型
   P(觀察到的現象|模型)
2. 怎樣的狀態序列最符合觀察到的現
   象和已知的模型
   P(狀態序列|觀察到的現象, 模型)
3. 怎樣的模型最有可能產生觀察到的
   現象
   what 模型 maximize P(觀察到的現象|
   模型)
Solution 1
• 已知模型,一觀察序列之產生機率 P(O|λ)
          R1               R1         R1
     S1               S1         S1
          R2               R2         R2

     S2   R1          S2 R1      S2   R1
          R2             R2           R2

     S3   R1          S3 R1      S3   R1
          R2             R2           R2

          1                2          3    t

     觀察到 R1      R1        R2 的機率為多少?
Solution 1
• 考慮一特定的狀態序列
                Q = q1, q2 … qT

• 產生出一特定觀察序列之機率為

 P(O|Q, λ) = P(O1|q1, λ) * P(O2|q2, λ) * … * P(Ot|qt, λ)

           = bq1(O1) * bq2(O2) * … * bqT(OT)
Solution 1
• 此一特定序列發生之機率為

               P(Q|λ) = πq1 * aq1q2 * aq2q3 * … * aq(T-1)qT

• 已知模型,一觀察序列之產生機率 P(O|λ)

 P(O|λ) = ,q∑ q P(O|Q, λ) * P(Q| λ)
         q  ,...,
               1   2   T




 = q ,q∑ q πq1 * bq1(O1) * aq1q2 bq2(O2)* … * aq(T-1)qT * bqT(OT)
       ,...,
   1   2   T
Solution 1
                 狀態的數量)
• Complexity (N: 狀態的數量
  – 2T*NT ≈ (2T-1)NT muls + NT-1 adds (NT:狀態轉換
    組合數)
  – For N=5 states, T=100 observations, there are
    order 2*100*5100 ≈ 1072 computations!!
• Forward Algorithm
  – Forward variable αt(i) (給定時間 t 時狀態為 Si 的
    條件下,向前   向前局部觀察序列為O1, O2, O3…, Ot 的
             向前
    機率)
           at (i ) = P(O1 , O2 ,..., Ot , qt = Si | λ )
Solution 1
                          R1                R1                       R1
                    S1                 S1                      S1
                          R2                R2                       R2
                                                                              When O1 = R1
                    S2    R1           S2 R1                   S2    R1
                          R2              R2                         R2

                    S3    R1           S3 R1                   S3    R1
                          R2              R2                         R2

                         1                  2                       3                   t

α1 ( i ) = π i bi ( O1 ) 1 ≤ i ≤ N
                                     α 2 (1) = α1 (1) a11 + α1 ( 2 ) a21 + α1 ( 3) a31  b1 ( O2 )
                                                                                       
 α1 (1) = π 1b1 (O1 )
 α1 (2) = π 2b2 (O1 )                α 2 ( 2 ) = α1 (1) a12 + α1 ( 2 ) a22 + α1 ( 3) a32  b2 ( O2 )
                                                                                         
 α1 (3) = π 3b3 (O1 )
Forward Algorithm
• Initialization:
                  α1 (i ) = π i bi (O1 ) 1 ≤ i ≤ N
• Induction:
                  N                             1 ≤ t ≤ T −1
    αt +1 ( j ) = ∑αt ( i ) aij  bj ( Ot +1 )   1 ≤ j ≤ N
                   i=1          

• Termination:
                                            N
                          P(O | λ ) = ∑ αT (i )
                                           i =1
Backward Algorithm
• Forward Algorithm
        at (i ) = P(O1 , O2 ,..., Ot , qt = Si | λ )


• Backward Algorithm
  – 給定時間 t 時狀態為 Si 的條件下,向後    向後局
                              向後
    部觀察序列為 Ot+1, Ot+2, …, OT的機率

        βt (i ) = P(Ot +1 , Ot + 2 ,..., OT , qt = Si | λ )
Backward Algorithm
• Initialization
                      βT (i ) = 1 1 ≤ i ≤ N
• Induction
                N
                                                t = T −1, T − 2, ...,1
    βt (i ) = ∑ aij b j (Ot +1 ) β t +1 ( j )
               j =1                             1≤ i ≤ N
Backward Algorithm
             R1               R1             R1
    S1                   S1         S1
             R2               R2             R2
                                                  When OT = R1
    S2       R1          S2 R1      S2       R1
             R2             R2               R2

    S3       R1          S3 R1      S3       R1
             R2             R2               R2

         1                    2          3             t

                  N
β T −1 (1) = ∑ a1 j b j ( OT ) β T ( j )
                  j =1

= a11b1 ( OT ) + a12 b2 ( OT ) + a13b3 ( OT )
Solution 2
• 怎樣的狀態序列最能解釋觀察到的現
  象和已知的模型
  P(狀態序列|觀察到的現象, 模型)

• 無精確解,有很多種方式解此問題,
  對狀態序列的不同限制有不同的解法
  對狀態序列的不同限制
Solution 2
• 例: Choose the state qt which are individually
  most likely
  – γt(i) : the probability of being in state Si at
    time t, given the observation sequence O,
    and the model λ
                P (O | qt = Si , λ ) α t ( i ) βt ( i )    α t ( i ) βt ( i )
     γ t (i ) =                     =                   = N
                    P (O λ )           P (O λ )
                                                         ∑ α t ( i ) βt ( i )
                                                           i =1

     qt = argmax γ t ( i )  1 ≤ t ≤ T
                           
             1≤i ≤ N
Viterbi algorithm
• The most widely used criteria is to find
  the “single best state sequence”
     maxmize P ( Q | O, λ ) ≈ maxmize P ( Q, O | λ )


• A formal technique exists, based on
  dynamic programming methods, and is
  called the Viterbi algorithm
Viterbi algorithm
• To find the single best state sequence, Q =
  {q1, q2, …, qT}, for the given observation
  sequence O = {O1, O2, …, OT}

• δt(i): the best score (highest prob.) along a
  single path, at time t, which accounts for the
  first t observations and end in state Si
   δ t ( i ) = max P  q1 q2 ... qt = Si , O1 O2 ... Ot λ 
                                                         
            1 q , q ,..., q
                2   t −1
Viterbi algorithm
• Initialization - δ1(i)
  – When t = 1 the most probable path to a
    state does not sensibly exist

  – However we use the probability of being in
    that state given t = 1 and the observable
    state O1
                δ1 ( i ) = π i bi ( O1 ) 1 ≤ i ≤ N
               ψ (i ) = 0
Viterbi algorithm
• Calculate δt(i) when t > 1
  – δt(i) : The most probable path to the state X
    at time t
  – This path to X will have to pass through one
    of the states A, B or C at time (t-1)
  Most probable path to A: δ t −1 ( A)   a AX   bX ( Ot )
Viterbi algorithm
• Recursion
   δ t ( j ) = max δ t −1 ( i ) aij  b j ( Ot )
                                                  2≤t ≤T
               1≤ i ≤ N

   ψ t ( j ) = argmax δ t −1 ( i ) aij            1≤ j ≤ N
                      
                     1≤ i ≤ N
                                        

• Termination
   P* = max δ T ( i ) 
                      
           1≤i ≤ N

   q = argmax δ T ( i ) 
     *
     T                  
             1≤i ≤ N
Viterbi algorithm
• Path (state sequence) backtracking
   qt* = ψ t +1 (qt*+1 )   t = T − 1, T − 2, ..., 1
   qT −1 = ψ T (qT ) = argmax δ T −1 ( i ) aiq* 
    *            *

                        1≤i ≤ N               T 


   ...
   ...
    *         *
   q1 = ψ 2 (q2 )
Solution 3
• 怎樣的模型 λ = (A, B, π) 最有可能產生
  觀察到的現象
   what 模型 maximize P(觀察到的現象|
  模型)
• There is no known analytic solution. We
  can choose λ = (A, B, π) such that P(O| λ)
  is locally maximized using an iterative
  procedure
Baum-Welch Method
    • Define ξt(i, j) = P(qt=Si , qt+1=Sj|O, λ)
         – The probability of being in state Si at time t,
           and state Sj at time t+1

              α t ( i ) aij b j ( Ot +1 ) βt +1 ( j )
ξt ( i, j ) =
                            P (O λ )
         α t ( i ) aij b j ( Ot +1 ) βt +1 ( j )
=    N    N

    ∑∑ α ( i ) a b ( O ) β ( j )
    i =1 j =1
                t        ij   j    t +1    t +1
Baum-Welch Method
• γt(i) : the probability of being in state Si at time
  t, given the observation sequence O, and the
  model λ
                                              α t ( i ) βt ( i )    α ( i ) βt ( i )
                                 γ t (i ) =                      = N t
                                                P (O λ )
                                                                  ∑ α t ( i ) βt ( i )
• Relate γt(i) to ξt(i, j)                                         i =1



                                               α t ( i ) aij b j ( Ot +1 ) βt +1 ( j )
                N                ξt ( i, j ) =
     γ t ( i ) = ∑ ξt ( i, j )                               P (O λ )
                j =1                     α t ( i ) aij b j ( Ot +1 ) βt +1 ( j )
                                 =   N    N

                                     ∑∑ α ( i ) a b ( O ) β ( j )
                                     i =1 j =1
                                                 t       ij   j    t +1    t +1
Baum-Welch Method
• The expected number of times that state Si is
  visited
         T −1

         ∑ γ ( i ) = Expected number of transitions from Si
         t =1
                t




• Similarly, the expected number of transitions
  from state Si to state Sj
  T −1

  ∑ ξ ( i, j ) = Expected number of transitions from S to S
  t =1
          t                                               i   j
Baum-Welch Method
• Re-estimation formulas for π, A and B
π i = γ1(i)
        T −1

        ∑ξ (i, j)
        t =1
                 t
                                     expected number of transitions from state Si to S j
aij =     T −1
                                 =
                                        expected number of transitions from state Si
         ∑γt (i)
          t =1
                     T

                  ∑t =1
                             γ t ( j)
               s.t. Ot =vk                expected number of times in state j and observing symbol vk
b j (k) =           T
                                        =
                                                     expected number of times in state j
                  ∑γ ( j)
                     t =1
                             t
Baum-Welch Method
• P(O|λ) > P(O|λ)

• Iteratively use λ in place of λ and repeat
  the re-estimation, we then can improve
  P(O| λ) until some limiting point is
  reached

More Related Content

What's hot (20)

PPT
Rdnd2008
Gautam Sethia
 
PDF
Research Inventy : International Journal of Engineering and Science
researchinventy
 
PDF
L 32(nkd)(et) ((ee)nptel)
sairoopareddy
 
PDF
Mcqmc talk
Chiheb Ben Hammouda
 
PDF
WAVELET-PACKET-BASED ADAPTIVE ALGORITHM FOR SPARSE IMPULSE RESPONSE IDENTIFI...
bermudez_jcm
 
PDF
Quantization
wtyru1989
 
PDF
R. Jimenez - Fundamental Physics from Astronomical Observations
SEENET-MTP
 
PDF
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis
Amr E. Mohamed
 
PDF
Nonlinear Stochastic Optimization by the Monte-Carlo Method
SSA KPI
 
PDF
Signal fundamentals
Lalit Kanoje
 
PPT
Z transfrm ppt
SWATI MISHRA
 
PPT
z transforms
Shahbaz Goshtasebi
 
PPTX
Laplace transform & fourier series
vaibhav tailor
 
PPTX
Fourier Series for Continuous Time & Discrete Time Signals
Jayanshu Gundaniya
 
PDF
Balanced homodyne detection
wtyru1989
 
PPT
Fourier analysis of signals and systems
Babul Islam
 
PDF
SSA slides
atikrkhan
 
PDF
Dcs lec02 - z-transform
Amr E. Mohamed
 
PDF
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
Amr E. Mohamed
 
Rdnd2008
Gautam Sethia
 
Research Inventy : International Journal of Engineering and Science
researchinventy
 
L 32(nkd)(et) ((ee)nptel)
sairoopareddy
 
WAVELET-PACKET-BASED ADAPTIVE ALGORITHM FOR SPARSE IMPULSE RESPONSE IDENTIFI...
bermudez_jcm
 
Quantization
wtyru1989
 
R. Jimenez - Fundamental Physics from Astronomical Observations
SEENET-MTP
 
DSP_FOEHU - MATLAB 02 - The Discrete-time Fourier Analysis
Amr E. Mohamed
 
Nonlinear Stochastic Optimization by the Monte-Carlo Method
SSA KPI
 
Signal fundamentals
Lalit Kanoje
 
Z transfrm ppt
SWATI MISHRA
 
z transforms
Shahbaz Goshtasebi
 
Laplace transform & fourier series
vaibhav tailor
 
Fourier Series for Continuous Time & Discrete Time Signals
Jayanshu Gundaniya
 
Balanced homodyne detection
wtyru1989
 
Fourier analysis of signals and systems
Babul Islam
 
SSA slides
atikrkhan
 
Dcs lec02 - z-transform
Amr E. Mohamed
 
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
Amr E. Mohamed
 

Viewers also liked (6)

PDF
Data Science - Part XIII - Hidden Markov Models
Derek Kane
 
PDF
12 Machine Learning Supervised Hidden Markov Chains
Andres Mendez-Vazquez
 
PDF
Probabilistic Models of Time Series and Sequences
Zitao Liu
 
PPT
Hidden markov model ppt
Shivangi Saxena
 
PPT
Hidden Markov Model & Stock Prediction
David Chiu
 
Data Science - Part XIII - Hidden Markov Models
Derek Kane
 
12 Machine Learning Supervised Hidden Markov Chains
Andres Mendez-Vazquez
 
Probabilistic Models of Time Series and Sequences
Zitao Liu
 
Hidden markov model ppt
Shivangi Saxena
 
Hidden Markov Model & Stock Prediction
David Chiu
 
Ad

Similar to An Introduction to Hidden Markov Model (20)

PDF
Quantum mechanics 1st edition mc intyre solutions manual
Selina333
 
PDF
Spectral Learning Methods for Finite State Machines with Applications to Na...
LARCA UPC
 
PDF
Chapter 4 likelihood
NBER
 
PDF
Hidden markovmodel
Piyorot
 
PDF
omp-and-k-svd - Gdc2013
Manchor Ko
 
PPT
Cognitive radio
Rached Abdel
 
PDF
A new transformation into State Transition Algorithm for finding the global m...
Michael_Chou
 
PDF
ma112011id535
matsushimalab
 
PDF
Ma2520962099
IJERA Editor
 
PDF
Generalized Reinforcement Learning
Po-Hsiang (Barnett) Chiu
 
PPT
fghdfh
pushbarajaa
 
PDF
One way to see higher dimensional surface
Kenta Oono
 
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
PDF
Machine Learning With MapReduce, K-Means, MLE
Jason J Pulikkottil
 
PDF
Some Dynamical Behaviours of a Two Dimensional Nonlinear Map
IJMER
 
PDF
8 krishna mohan gonuguntla 86--93
Alexander Decker
 
PDF
CVPR2010: higher order models in computer vision: Part 1, 2
zukun
 
PDF
How to design a linear control system
Alireza Mirzaei
 
PDF
D028036046
inventionjournals
 
Quantum mechanics 1st edition mc intyre solutions manual
Selina333
 
Spectral Learning Methods for Finite State Machines with Applications to Na...
LARCA UPC
 
Chapter 4 likelihood
NBER
 
Hidden markovmodel
Piyorot
 
omp-and-k-svd - Gdc2013
Manchor Ko
 
Cognitive radio
Rached Abdel
 
A new transformation into State Transition Algorithm for finding the global m...
Michael_Chou
 
ma112011id535
matsushimalab
 
Ma2520962099
IJERA Editor
 
Generalized Reinforcement Learning
Po-Hsiang (Barnett) Chiu
 
fghdfh
pushbarajaa
 
One way to see higher dimensional surface
Kenta Oono
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Machine Learning With MapReduce, K-Means, MLE
Jason J Pulikkottil
 
Some Dynamical Behaviours of a Two Dimensional Nonlinear Map
IJMER
 
8 krishna mohan gonuguntla 86--93
Alexander Decker
 
CVPR2010: higher order models in computer vision: Part 1, 2
zukun
 
How to design a linear control system
Alireza Mirzaei
 
D028036046
inventionjournals
 
Ad

More from Shih-Hsiang Lin (13)

PDF
Introduction to Apache Ant
Shih-Hsiang Lin
 
PDF
Introduction to GNU Make Programming Language
Shih-Hsiang Lin
 
KEY
Ch6 file, saving states, and preferences
Shih-Hsiang Lin
 
PDF
[C++ gui programming with qt4] chap9
Shih-Hsiang Lin
 
PDF
Ch5 intent broadcast receivers adapters and internet
Shih-Hsiang Lin
 
PDF
Ch4 creating user interfaces
Shih-Hsiang Lin
 
PDF
Ch3 creating application and activities
Shih-Hsiang Lin
 
PDF
[C++ GUI Programming with Qt4] chap7
Shih-Hsiang Lin
 
PPTX
[C++ GUI Programming with Qt4] chap4
Shih-Hsiang Lin
 
PPTX
Function pointer
Shih-Hsiang Lin
 
PPT
Introduction to homography
Shih-Hsiang Lin
 
PPTX
Git basic
Shih-Hsiang Lin
 
PPTX
Project Hosting by Google
Shih-Hsiang Lin
 
Introduction to Apache Ant
Shih-Hsiang Lin
 
Introduction to GNU Make Programming Language
Shih-Hsiang Lin
 
Ch6 file, saving states, and preferences
Shih-Hsiang Lin
 
[C++ gui programming with qt4] chap9
Shih-Hsiang Lin
 
Ch5 intent broadcast receivers adapters and internet
Shih-Hsiang Lin
 
Ch4 creating user interfaces
Shih-Hsiang Lin
 
Ch3 creating application and activities
Shih-Hsiang Lin
 
[C++ GUI Programming with Qt4] chap7
Shih-Hsiang Lin
 
[C++ GUI Programming with Qt4] chap4
Shih-Hsiang Lin
 
Function pointer
Shih-Hsiang Lin
 
Introduction to homography
Shih-Hsiang Lin
 
Git basic
Shih-Hsiang Lin
 
Project Hosting by Google
Shih-Hsiang Lin
 

Recently uploaded (20)

PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
July Patch Tuesday
Ivanti
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
July Patch Tuesday
Ivanti
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Python basic programing language for automation
DanialHabibi2
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 

An Introduction to Hidden Markov Model

  • 1. An Introduction to HMM Browny 2010.07.21
  • 2. MM vs. HMM States States Observations
  • 3. Markov Model • Given 3 weather states: – {S1, S2, S3} = {rain, cloudy, sunny} Rain Cloudy Sunny Rain 0.4 0.3 0.3 Cloudy 0.2 0.6 0.2 Sunny 0.1 0.1 0.8 • What is the probabilities for next 7 days will be {sun, sun, rain, rain, sun, cloud, sun} ?
  • 4. Hidden Markov Model • The states – We don’t understand, Hidden! – But it can be indirectly observed • Example – 北極or赤道(model), Hot/Cold(state), 1/2/3 ice cream(observation)
  • 5. Hidden Markov Model • The observation is a probability function of state which is not observable directly Hidden States
  • 6. HMM Elements • N, the number of states in the model • M, the number of distinct observation symbols • A, the state transition probability distribution • B, the observation symbol probability distribution in states • π, the initial state distribution λ: model
  • 7. Example P(…|C) P(…|H) P(…|Start) P(1|…) 0.7 0.1 P(2|…) 0.2 B: 0.2 Observation P(3|…) 0.1 0.7 P(C|…) 0.8 0.1 0.5 A: π: P(H|…) 0.1 0.8 0.5 Transition initial P(STOP|…) 0.1 0.1 0
  • 9. 3 Problems 1. 觀察到的現象最符合哪一個模型 P(觀察到的現象|模型) 2. 怎樣的狀態序列最符合觀察到的現 象和已知的模型 P(狀態序列|觀察到的現象, 模型) 3. 怎樣的模型最有可能產生觀察到的 現象 what 模型 maximize P(觀察到的現象| 模型)
  • 10. Solution 1 • 已知模型,一觀察序列之產生機率 P(O|λ) R1 R1 R1 S1 S1 S1 R2 R2 R2 S2 R1 S2 R1 S2 R1 R2 R2 R2 S3 R1 S3 R1 S3 R1 R2 R2 R2 1 2 3 t 觀察到 R1 R1 R2 的機率為多少?
  • 11. Solution 1 • 考慮一特定的狀態序列 Q = q1, q2 … qT • 產生出一特定觀察序列之機率為 P(O|Q, λ) = P(O1|q1, λ) * P(O2|q2, λ) * … * P(Ot|qt, λ) = bq1(O1) * bq2(O2) * … * bqT(OT)
  • 12. Solution 1 • 此一特定序列發生之機率為 P(Q|λ) = πq1 * aq1q2 * aq2q3 * … * aq(T-1)qT • 已知模型,一觀察序列之產生機率 P(O|λ) P(O|λ) = ,q∑ q P(O|Q, λ) * P(Q| λ) q ,..., 1 2 T = q ,q∑ q πq1 * bq1(O1) * aq1q2 bq2(O2)* … * aq(T-1)qT * bqT(OT) ,..., 1 2 T
  • 13. Solution 1 狀態的數量) • Complexity (N: 狀態的數量 – 2T*NT ≈ (2T-1)NT muls + NT-1 adds (NT:狀態轉換 組合數) – For N=5 states, T=100 observations, there are order 2*100*5100 ≈ 1072 computations!! • Forward Algorithm – Forward variable αt(i) (給定時間 t 時狀態為 Si 的 條件下,向前 向前局部觀察序列為O1, O2, O3…, Ot 的 向前 機率) at (i ) = P(O1 , O2 ,..., Ot , qt = Si | λ )
  • 14. Solution 1 R1 R1 R1 S1 S1 S1 R2 R2 R2 When O1 = R1 S2 R1 S2 R1 S2 R1 R2 R2 R2 S3 R1 S3 R1 S3 R1 R2 R2 R2 1 2 3 t α1 ( i ) = π i bi ( O1 ) 1 ≤ i ≤ N α 2 (1) = α1 (1) a11 + α1 ( 2 ) a21 + α1 ( 3) a31  b1 ( O2 )   α1 (1) = π 1b1 (O1 ) α1 (2) = π 2b2 (O1 ) α 2 ( 2 ) = α1 (1) a12 + α1 ( 2 ) a22 + α1 ( 3) a32  b2 ( O2 )   α1 (3) = π 3b3 (O1 )
  • 15. Forward Algorithm • Initialization: α1 (i ) = π i bi (O1 ) 1 ≤ i ≤ N • Induction: N  1 ≤ t ≤ T −1 αt +1 ( j ) = ∑αt ( i ) aij  bj ( Ot +1 ) 1 ≤ j ≤ N  i=1  • Termination: N P(O | λ ) = ∑ αT (i ) i =1
  • 16. Backward Algorithm • Forward Algorithm at (i ) = P(O1 , O2 ,..., Ot , qt = Si | λ ) • Backward Algorithm – 給定時間 t 時狀態為 Si 的條件下,向後 向後局 向後 部觀察序列為 Ot+1, Ot+2, …, OT的機率 βt (i ) = P(Ot +1 , Ot + 2 ,..., OT , qt = Si | λ )
  • 17. Backward Algorithm • Initialization βT (i ) = 1 1 ≤ i ≤ N • Induction N t = T −1, T − 2, ...,1 βt (i ) = ∑ aij b j (Ot +1 ) β t +1 ( j ) j =1 1≤ i ≤ N
  • 18. Backward Algorithm R1 R1 R1 S1 S1 S1 R2 R2 R2 When OT = R1 S2 R1 S2 R1 S2 R1 R2 R2 R2 S3 R1 S3 R1 S3 R1 R2 R2 R2 1 2 3 t N β T −1 (1) = ∑ a1 j b j ( OT ) β T ( j ) j =1 = a11b1 ( OT ) + a12 b2 ( OT ) + a13b3 ( OT )
  • 19. Solution 2 • 怎樣的狀態序列最能解釋觀察到的現 象和已知的模型 P(狀態序列|觀察到的現象, 模型) • 無精確解,有很多種方式解此問題, 對狀態序列的不同限制有不同的解法 對狀態序列的不同限制
  • 20. Solution 2 • 例: Choose the state qt which are individually most likely – γt(i) : the probability of being in state Si at time t, given the observation sequence O, and the model λ P (O | qt = Si , λ ) α t ( i ) βt ( i ) α t ( i ) βt ( i ) γ t (i ) = = = N P (O λ ) P (O λ ) ∑ α t ( i ) βt ( i ) i =1 qt = argmax γ t ( i )  1 ≤ t ≤ T   1≤i ≤ N
  • 21. Viterbi algorithm • The most widely used criteria is to find the “single best state sequence” maxmize P ( Q | O, λ ) ≈ maxmize P ( Q, O | λ ) • A formal technique exists, based on dynamic programming methods, and is called the Viterbi algorithm
  • 22. Viterbi algorithm • To find the single best state sequence, Q = {q1, q2, …, qT}, for the given observation sequence O = {O1, O2, …, OT} • δt(i): the best score (highest prob.) along a single path, at time t, which accounts for the first t observations and end in state Si δ t ( i ) = max P  q1 q2 ... qt = Si , O1 O2 ... Ot λ    1 q , q ,..., q 2 t −1
  • 23. Viterbi algorithm • Initialization - δ1(i) – When t = 1 the most probable path to a state does not sensibly exist – However we use the probability of being in that state given t = 1 and the observable state O1 δ1 ( i ) = π i bi ( O1 ) 1 ≤ i ≤ N ψ (i ) = 0
  • 24. Viterbi algorithm • Calculate δt(i) when t > 1 – δt(i) : The most probable path to the state X at time t – This path to X will have to pass through one of the states A, B or C at time (t-1) Most probable path to A: δ t −1 ( A) a AX bX ( Ot )
  • 25. Viterbi algorithm • Recursion δ t ( j ) = max δ t −1 ( i ) aij  b j ( Ot )   2≤t ≤T 1≤ i ≤ N ψ t ( j ) = argmax δ t −1 ( i ) aij  1≤ j ≤ N  1≤ i ≤ N  • Termination P* = max δ T ( i )    1≤i ≤ N q = argmax δ T ( i )  * T   1≤i ≤ N
  • 26. Viterbi algorithm • Path (state sequence) backtracking qt* = ψ t +1 (qt*+1 ) t = T − 1, T − 2, ..., 1 qT −1 = ψ T (qT ) = argmax δ T −1 ( i ) aiq*  * * 1≤i ≤ N  T  ... ... * * q1 = ψ 2 (q2 )
  • 27. Solution 3 • 怎樣的模型 λ = (A, B, π) 最有可能產生 觀察到的現象 what 模型 maximize P(觀察到的現象| 模型) • There is no known analytic solution. We can choose λ = (A, B, π) such that P(O| λ) is locally maximized using an iterative procedure
  • 28. Baum-Welch Method • Define ξt(i, j) = P(qt=Si , qt+1=Sj|O, λ) – The probability of being in state Si at time t, and state Sj at time t+1 α t ( i ) aij b j ( Ot +1 ) βt +1 ( j ) ξt ( i, j ) = P (O λ ) α t ( i ) aij b j ( Ot +1 ) βt +1 ( j ) = N N ∑∑ α ( i ) a b ( O ) β ( j ) i =1 j =1 t ij j t +1 t +1
  • 29. Baum-Welch Method • γt(i) : the probability of being in state Si at time t, given the observation sequence O, and the model λ α t ( i ) βt ( i ) α ( i ) βt ( i ) γ t (i ) = = N t P (O λ ) ∑ α t ( i ) βt ( i ) • Relate γt(i) to ξt(i, j) i =1 α t ( i ) aij b j ( Ot +1 ) βt +1 ( j ) N ξt ( i, j ) = γ t ( i ) = ∑ ξt ( i, j ) P (O λ ) j =1 α t ( i ) aij b j ( Ot +1 ) βt +1 ( j ) = N N ∑∑ α ( i ) a b ( O ) β ( j ) i =1 j =1 t ij j t +1 t +1
  • 30. Baum-Welch Method • The expected number of times that state Si is visited T −1 ∑ γ ( i ) = Expected number of transitions from Si t =1 t • Similarly, the expected number of transitions from state Si to state Sj T −1 ∑ ξ ( i, j ) = Expected number of transitions from S to S t =1 t i j
  • 31. Baum-Welch Method • Re-estimation formulas for π, A and B π i = γ1(i) T −1 ∑ξ (i, j) t =1 t expected number of transitions from state Si to S j aij = T −1 = expected number of transitions from state Si ∑γt (i) t =1 T ∑t =1 γ t ( j) s.t. Ot =vk expected number of times in state j and observing symbol vk b j (k) = T = expected number of times in state j ∑γ ( j) t =1 t
  • 32. Baum-Welch Method • P(O|λ) > P(O|λ) • Iteratively use λ in place of λ and repeat the re-estimation, we then can improve P(O| λ) until some limiting point is reached