Sargur N. Srihari srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/CSE574/index.html Machine Learning: CSE 574
0
HMM Topics
1.
2.
3.
4.
5.
6.
7.
What is an HMM?
State-space Representation
HMM Parameters
Generative View of HMM
Determining HMM Parameters Using EM
Forward-Backward or α−β algorithm
HMM Implementation Issues:
a)
b)
c)
d)
e)
Length of Sequence
Predictive Distribution
Sum-Product Algorithm
Scaling Factors
Viterbi Algorithm
Machine Learning: CSE 574
1
1. What is an HMM?
•
•
Ubiquitous tool for modeling time series data
Used in
•
•
•
Almost all speech recognition systems
Computational molecular biology
•
Group amino acid sequences into proteins
Handwritten word recognition
•
•
It is a tool for representing probability distributions over sequences of observations
HMM gets its name from two defining properties:
•
Example: z are phoneme sequences x are acoustic observations
•
•
Observation xt at time t was generated by some process whose state zt is hidden from the observer
Assumes that state at zt is dependent only on state zt-1 and independent of all prior states (First order)
Machine Learning: CSE 574
2
Graphical Model of HMM
• Has the graphical model shown below and latent variables are discrete
• Joint distribution has the form:
⎤ N
⎡ N p ( x1 ,..x N , z1 ,..z n ) = p ( z1 ) ⎢∏ p ( z n | z n −1 )⎥∏ p (xn | z n )
⎦ n =1
⎣ n=2
Machine Learning: CSE 574
3
HMM Viewed as Mixture
•
•
•
•
A single time slice corresponds to a mixture distribution with component densities p(x|z)
•
Each state of discrete variable z represents a different component
An extension of mixture model
•
Choice of mixture component depends on choice of mixture component for previous distribution
Latent variables are multinomial variables zn
•
That describe component responsible for generating xn
Can use one-of-K coding scheme
Machine Learning: CSE 574
4
2. State-Space Representation
• Probability distribution of zn
•
State k 1 2 . K of zn znk 0 1 . 0
depends on state of previous latent variable zn-1 through probability distribution p(zn|zn-1)
State
One-of K coding
of zn-1
• Since latent variables are Kdimensional binary vectors
Ajk=p(znk=1|zn-1,j=1)
ΣkAjk=1
A matrix • These are known as Transition
•
Probabilities zn-1 K(K-1) independent parameters
Machine Learning: CSE 574
j
1 2 . K zn-1,j 1 0 . 0 zn 1 2 …. K
1
2
…
K
Ajk
5
Transition Probabilities
Example with 3-state latent variable zn=1 zn=2 zn1=1 zn1=0 zn2=0 zn2=1 zn3=0 zn3=0
zn=3 zn1=0 zn2=0 zn3=1 zn-1=1 zn-1,1=1 zn-1,2=0 zn-1,3=0 A11
A12
A13
zn-1=2 zn-1,1=0 zn-1,2=1 zn-1,3=0 A21
A22
A23
zn-3= 3 zn-1,1=0 zn-1,2=0 zn-1,3=1 A31
A32
Machine Learning: CSE 574
A33
State Transition Diagram
A11+A12+A13=1
• Not a graphical model since nodes are not separate variables but states of a single variable 6
• Here K=3
Conditional Probabilities
•
•
Transition probabilities Ajk represent state-to-state probabilities for each variable
Conditional probabilities are variable-to-variable probabilities •
can be written in terms of transition probabilities as
K
K
p (z n | z n −1 , A) = ∏∏ A
•
•
k =1 j =1
Note that exponent zn-1,j zn,k is a product that evaluates to 0 or 1
Hence the overall product will evaluate to a single Ajk for each setting of values of zn and zn-1
•
•
z n−1, j z n ,k jk E.g., zn-1=2 and zn=3 will result in only zn-1,2=1 and zn,3=1. Thus p(zn=3|zn-1=2)=A23 A is a global HMM parameter
Machine Learning: CSE 574
7
Initial Variable Probabilities
• Initial latent node z1 is a special case without
•
parent node
Represented by vector of probabilities π with elements πk=p(z1k=1) so that
K
p ( z1 | π ) = ∏ π
z1,k k where Σ kπ k = 1
k =1
• Note that π is an HMM parameter
• representing probabilities of each state for
the
first variable
Machine Learning: CSE 574
8
Lattice or Trellis Diagram
• State transition diagram unfolded over time
•