Hidden Markov Models - 3628 Words

Hidden Markov Models
Sargur N. Srihari srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/CSE574/index.html Machine Learning: CSE 574

0

HMM Topics
1.
2.
3.
4.
5.
6.
7.

What is an HMM?
State-space Representation
HMM Parameters
Generative View of HMM
Determining HMM Parameters Using EM
Forward-Backward or α−β algorithm
HMM Implementation Issues:
a)
b)
c)
d)
e)

Length of Sequence
Predictive Distribution
Sum-Product Algorithm
Scaling Factors
Viterbi Algorithm

Machine Learning: CSE 574

1

1. What is an HMM?
•
•

Ubiquitous tool for modeling time series data
Used in

•
•
•

Almost all speech recognition systems
Computational molecular biology

•

Group amino acid sequences into proteins

Handwritten word recognition

•
•

It is a tool for representing probability distributions over sequences of observations
HMM gets its name from two defining properties:

•

Example: z are phoneme sequences x are acoustic observations

•
•

Observation xt at time t was generated by some process whose state zt is hidden from the observer
Assumes that state at zt is dependent only on state zt-1 and independent of all prior states (First order)

Machine Learning: CSE 574

2

Graphical Model of HMM

• Has the graphical model shown below and latent variables are discrete

• Joint distribution has the form:
⎤ N
⎡ N p ( x1 ,..x N , z1 ,..z n ) = p ( z1 ) ⎢∏ p ( z n | z n −1 )⎥∏ p (xn | z n )
⎦ n =1
⎣ n=2
Machine Learning: CSE 574

3

HMM Viewed as Mixture

•
•
•
•

A single time slice corresponds to a mixture distribution with component densities p(x|z)

•

Each state of discrete variable z represents a different component

An extension of mixture model

•

Choice of mixture component depends on choice of mixture component for previous distribution

Latent variables are multinomial variables zn

•

That describe component responsible for generating xn

Can use one-of-K coding scheme

Machine Learning: CSE 574

4

2. State-Space Representation

• Probability distribution of zn
•

State k 1 2 . K of zn znk 0 1 . 0

depends on state of previous latent variable zn-1 through probability distribution p(zn|zn-1)
State
One-of K coding

of zn-1

• Since latent variables are Kdimensional binary vectors

Ajk=p(znk=1|zn-1,j=1)

ΣkAjk=1

A matrix • These are known as Transition
•

Probabilities zn-1 K(K-1) independent parameters
Machine Learning: CSE 574

j
1 2 . K zn-1,j 1 0 . 0 zn 1 2 …. K
1
2
…
K

Ajk
5

Transition Probabilities
Example with 3-state latent variable zn=1 zn=2 zn1=1 zn1=0 zn2=0 zn2=1 zn3=0 zn3=0

zn=3 zn1=0 zn2=0 zn3=1 zn-1=1 zn-1,1=1 zn-1,2=0 zn-1,3=0 A11

A12

A13

zn-1=2 zn-1,1=0 zn-1,2=1 zn-1,3=0 A21

A22

A23

zn-3= 3 zn-1,1=0 zn-1,2=0 zn-1,3=1 A31

A32

Machine Learning: CSE 574

A33

State Transition Diagram

A11+A12+A13=1

• Not a graphical model since nodes are not separate variables but states of a single variable 6
• Here K=3

Conditional Probabilities
•
•

Transition probabilities Ajk represent state-to-state probabilities for each variable
Conditional probabilities are variable-to-variable probabilities •

can be written in terms of transition probabilities as
K

K

p (z n | z n −1 , A) = ∏∏ A
•
•

k =1 j =1

Note that exponent zn-1,j zn,k is a product that evaluates to 0 or 1
Hence the overall product will evaluate to a single Ajk for each setting of values of zn and zn-1

•

•

z n−1, j z n ,k jk E.g., zn-1=2 and zn=3 will result in only zn-1,2=1 and zn,3=1. Thus p(zn=3|zn-1=2)=A23 A is a global HMM parameter
Machine Learning: CSE 574

7

Initial Variable Probabilities
• Initial latent node z1 is a special case without
•

parent node
Represented by vector of probabilities π with elements πk=p(z1k=1) so that
K

p ( z1 | π ) = ∏ π

z1,k k where Σ kπ k = 1

k =1

• Note that π is an HMM parameter

• representing probabilities of each state for

the

first variable
Machine Learning: CSE 574

8

Lattice or Trellis Diagram
• State transition diagram unfolded over time

•

Show More