What is a hidden Markov model used for?

What is a hidden Markov model used for?

Yann LeCun

An introduction to Hidden Markov Model(HMM). In addition, its application in Stock Market analysis!

“The theory of probabilities is at bottom nothing but common sense reduced to calculation”.

Pierre-Simon Laplace

In my previous article, I introduced Markov models and we understood its simplest variant, i.e. Markov Chains, In this article, we will look at one more Markov model called Hidden Markov Models(HMMs).

There are four types of Markov models that are used situationally:

A Markov chain is useful when we need to compute a probability for a sequence of observable events.

In many cases, however, the events we are interested in are hidden: we don’t observe them directly. For example, we don’t normally observe part-of-speech tags in a text. Rather, we see words and must infer the tags from the word sequence. We call the tags hidden because they are not observed. To deal with these kinds of cases comes the Hidden Markov Models(HMMs).

On the left we have a Markov Chain, on the right, we have Hidden Markov Model, where S=states, y=possible observations, P=state transition probabilities, and b=observation probabilities. Credits

Hidden Markov models are especially known for their application in reinforcement learning and temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following partial discharges, and bioinformatics.

Hidden Markov Models

The Hidden Markov Models(HMMs) is a doubly stochastic process(collection of random variables and defined on a common probability space) where one of the underlying stochastic processes is hidden. The hidden process is a Markov chain going from one state to another but cannot be observed directly. The other process is observable and depends on the hidden states. The goal of HMM is to capture the hidden information from the observables.

An HMM consists of two stochastic processes, namely, an invisible process of hidden states and a visible process of observable symbols.

X – hidden states Y – observables a – transition probabilities: the probability of moving from one state to another state

b – emission probabilities: the probability of an observation being generated from a state Xi – initial probability

distribution: the probability that the Markov chain will start at a certain state

Modeling observations in these two layers, one visible and the other invisible are very useful since many real-world problems deal with classifying raw observations into a number of categories, or class labels, that are more meaningful to us. For example, let us consider the speech recognition problem. Moreover, for which HMMs became extensively used for several decades.

An HMM is specified by the following components:

  • T = length of the observation sequence
  • N = number of states in the model
  • M = number of observation symbols
  • Q = {q0,q1,….,qN-1} = distinct states of the Markov process
  • V = {0,1,….,M – 1} = set of possible observations
  • A = state transition probabilities
  • B = a sequence of observation likelihoods, also called emission probabilities, each expressing the probability of an observation ot being generated from a state i
  • Π = initial state distribution, πi is the probability that the Markov chain will start in state i.
  • O = (O0,O1,….,OT-1) = a sequence of T observations.

A first-order hidden Markov model instantiates two simplifying assumptions:

1. First, as with a first-order Markov chain, the probability of a particular state depends only on the previous state:

Markov Assumption: P(qi |q1…qi−1) = P(qi |qi−1)

2. Second, the probability of an output observation oi depends only on the state that produced the observation qi and not on any other states or any other observations:

Output Independence: P(oi |q1 …qi ,…,qT ,o1,…,oi ,…,oT ) = P(oi |qi)

Initial state distribution gets the model going by starting at a hidden state.

Full model with known State transition probabilities(A). Observation probability matrix(B). And Initial state distribution(π) marked as:

Hidden Markov Model and Stock market trends

Let us take a simple example that teaches you how to figure out market movements solely from the trader’s mood. Here the two hidden states are market rally or drop. Furthermore, the transition matrix stipulates that if the market goes up today, there is an 80% chance that the trend will continue tomorrow and a 20% chance that the trend will reverse. On the contrary, if the market drops today, there is a 70% chance of sliding further tomorrow and a 30% chance of turning back up. Illustrated below:


The emission matrix shows the dependency of the trader’s mood on market changes. If she is just moody (high noise/information ratio), it would tell us nothing about the market. If somehow her mood becomes driven by her PnL. Then we might be able to learn the market from her behavior. In this example, it says that if the market heads up, she would most likely be happy but there is a 10% chance that something else, for example, a car accident in the morning, upsets here. On the contrary, if the market goes down she would be unhappy, unless something else, perhaps today happens to be her anniversary keeps her happy.

With all these uncertainties/probabilities in place, there are three questions to ask, in the order of complexity:
  • Likelihood: What is the odds that she is happy three days in a row? or in other words, Given a known HMM model, λ = (A, B) and an observation sequence O, determine the likelihood of the sequence O happening, P(O|λ).

Solved by The Forward Algorithm.

  • Decoding: Given that she is happy three days in a row, what would be the most likely market movements in the last three days? or in other words, Given an HMM model, λ = (A, B) and an observation sequence O, determine the best or optimal hidden state sequence.

Solved by The Viterbi Algorithm.

  • Learning: If we don’t know any of the probabilities labeled next to the arrows in the figure, given that she is happy three days in a row, what would be our best estimates of these probabilities (model parameters)? or in other words, Given an observation sequence O and a set of hidden states in the HMM, learn the parameters A and B while determining the optimal model maximizing the probability of O.

Solved by The Baum-Welch Algorithm

Likelihood Computation: The Forward Algorithm

The Forward algorithm computes the posterior marginals of all hidden state variables. This means that it gives the probability distribution with relevant evidence and each background being taken into account. To achieve this it makes use of the principle of dynamic programming, which means simplifying a complicated problem by breaking it down into simpler sub-problems

Conclusion: The Forward-Backward algorithm is used to find the most likely state for any point in time.

In particular, the hidden Markov Model λ=(A, B, π) is given by:

The first question is, given we know λ, what are the odds that she is happy three days in a row? This is answered by the forward procedure or backward procedure and is solved recursively.

On day 1, there is a 45% chance to observe a happy face (H) and the market is up; there is a 20% chance to observe a happy face (H) and the market is down. So in total, there is a 65% chance that she would be happy and a 35% chance that she would be sad (S). Similarly, the probability of seeing HH is 47%. Other possibilities (that is, HH, HS, and SS) add up to 53%.

Finally, on day three, the answer to our question becomes 35.3%.

Decoding: The Viterbi Algorithm

The Viterbi algorithm is most useful when one wants to calculate the most likely path through the state transitions of these models over time. Let’s say we have N states and T moments in time, calculating the probabilities of all transitions over time would be N probability calculations. This algorithm speeds up all these calculations.

The observation made by the Viterbi algorithm is that for any state at time T, there is only one most likely path to that state. Therefore, if several paths converge at a particular state at time T, instead of redoing them all when calculating the transitions from this state to states at time T+1, one can discard the less likely paths, and only use the most likely one.

Conclusion: Using the Viterbi algorithm we can identify the most likely sequence of hidden states given the sequence of observations.

On day 1, the table is initialized. Then on days 2 and day3, it uses dynamic programming to find the optimal probability and states recursively. Finally, the most probable hidden states for the three days are {‘Up’, ‘Up’, ‘Up’} with a maximum probability of 23.328%. In other words, if she is happy three days in a row, most likely the market is also on a three-day winning streak.


import os
import pandas as pd
import numpy as np
from matplotlib import cm, pyplot as plt
from matplotlib.dates import YearLocator, MonthLocator
from datetime import datetime, date
from hmmlearn.hmm import GaussianHMM

obs = ('happy', 'happy', 'happy')
states = ('Up', 'Down')
start_p = {'Up': 0.5, 'Down': 0.5}
trans_p = {
    'Up' : {'Up': 0.8, 'Down': 0.2},
    'Down' : {'Up': 0.3, 'Down': 0.7}
emit_p = {
    'Up' : {'happy': 0.9, 'unhappy': 0.1},
    'Down' : {'happy': 0.4, 'unhappy': 0.6}

#Declare Viterbi function
def viterbi(obs, states, start_p, trans_p, emit_p):
    V = [{}]
    for st in states:
        V[0][st] = {"prob": start_p[st] * emit_p[st][obs[0]], "prev": None}
    # Run Viterbi when t > 0
    for t in range(1, len(obs)):
        for st in states:
            max_tr_prob = max(V[t-1][prev_st]["prob"]*trans_p[prev_st][st] for prev_st in states)
            for prev_st in states:
                if V[t-1][prev_st]["prob"] * trans_p[prev_st][st] == max_tr_prob:
                    max_prob = max_tr_prob * emit_p[st][obs[t]]
                    V[t][st] = {"prob": max_prob, "prev": prev_st}
    for line in dptable(V):
    opt = []
    # The highest probability
    max_prob = max(value["prob"] for value in V[-1].values())
    previous = None
    # Get most probable state and its backtrack
    for st, data in V[-1].items():
        if data["prob"] == max_prob:
            previous = st
    # Follow the backtrack till the first observation
    for t in range(len(V) - 2, -1, -1):
        opt.insert(0, V[t + 1][previous]["prev"])
        previous = V[t + 1][previous]["prev"]
    print('The steps of states are ' + ' '.join(opt) + ' with highest probability of %s' % max_prob)

def dptable(V):
        # Print a table of steps from dictionary
        yield " ".join(("%12d" % i) for i in range(len(V)))
        for state in V[0]:
            yield "%.7s: " % state + " ".join("%.7s" % ("%f" % v[state]["prob"]) for v in V)

viterbi(obs, states, start_p, trans_p, emit_p)


HMM Training: The Forward-Backward Algorithm

The Forward-Backward algorithm or the Baum-Welch algorithm becomes useful to find the unknown parameters of a hidden Markov model. It’s a special case of the EM algorithm (expectation-maximization algorithm) which is a method to find maximum a posteriori estimates of parameters in a statistical model.

The Baum-Welch algorithm become needed because the state paths stay hidden. In addition, the equations one cannot solve analytically.

Conclusion: The Baum-Welch algorithm attempts to find the model that assigns the training data the highest likelihood.

The well-known Baum–Welch algorithm takes the following steps,
  1. Calculate forward probabilities αi(t) with the forward algorithm
  2. Calculate backward probabilities βi(t) with the backward algorithm
  3. Expectation — calculate the probability of being in state i at time t, γi(t); and the probability of being in state i and j at times t and t+1, ξij(t)
  4. Maximization — update the new model parameters (start probabilities, transition probabilities, emission probabilities)
  5. Repeat steps 1 through 4 until the change in log-likelihood converges.
We will estimate latent variables [ ξ,γ ].

Back to our example, assume λ in the figure above is not the actual but our current belief. Then we have done steps 1 and 2 as in the first question. To proceed.

Step 4 updates the new normalized parameters as follows,

These steps are now repeated iteratively until it converges.


You can get download the data fromhere.

#Load the data
hist_file = "../input/spindex/HistoricalData_1637140863646.csv"
spx_price = pd.read_csv(hist_file, header=0, parse_dates=True, sep=',', index_col=0)
spx_price = spx_price['Close']
spx_price.name = 'SPX Index'
spx_ret = spx_price.shift(1)/ spx_price[1:] - 1
#spx_ret = spx_ret * 1000.0
rets = np.column_stack([spx_ret])

# Create the Gaussian Hidden markov Model and fit it
# to the SPY returns data, outputting a score
hmm_model = GaussianHMM(
    n_components=3,                     # number of states
    covariance_type="full",             # full covariance matrix vs diagonal
    n_iter=1000                         # number of iterations

print("Model Score:", hmm_model.score(rets))

# Plot the in sample hidden states closing values
# Predict the hidden states array
hidden_states = hmm_model.predict(rets)
print("Number of hidden states:",len(hidden_states))
print('\nPercentage of hidden state 1 = %f' % (sum(hidden_states)/len(hidden_states)))

print("\nTransition matrix")

print("\nMeans and vars of each hidden state")
for i in range(hmm_model.n_components):   
    # 0 is down, 1 is up
    print("\n{0}th hidden state".format(i))
    print("mean = ", hmm_model.means_[i])
    print("var = ", np.diag(hmm_model.covars_[i]))

fig, axs = plt.subplots(hmm_model.n_components, figsize=(16,8),sharex=True, sharey=True)
colours = cm.rainbow(np.linspace(0, 1, hmm_model.n_components))
for i, (ax, colour) in enumerate(zip(axs, colours)):
    # Use fancy indexing to plot data in each state.
    mask = hidden_states == i
    ax.plot_date(spx_ret.index[mask], spx_price.loc[spx_ret.index][mask], ".", linestyle='dotted', c=colour)
    ax.set_title("{0}th hidden state".format(i))

    # Format the ticks.


Advantages and Disadvantages of Hidden Markov Model

The Hidden Markov Model(HMM) has become a very prominent mathematical and graphical representation for appliances. Here’s an analysis of the advantages and disadvantages of the Hidden Markov Model:


  1. HMM is an analyzed probabilistic graphical model. The algorithms applied in this model are studied for approximate learning and conclusion.
  2. Hidden Markov Models (HMM) are said to acquire the contingency between successive measurements. As defined in the switch continuity principle.
  3. HMMs represent the variance of appliances’ power demands via probability distributions.


  1. HMM cannot represent any dependency between the appliances. The conditional HMM can capture the dependencies, though.
  2. HMM does not consider the state sequence dominating any given state because of its Markovian nature.
  3. HMMs do not explicitly capture the time in a specified state due to their Markovian behavior. Nonetheless, the hidden semi-Markov model is responsible for capturing that kind of behavior.

Back To News

Written by Nagesh Singh Chauhan